libspatialaudio 0.4: a modern spatial audio library

We just released libspatialaudio 0.4.0, the first true major release.

Rendering complex audio is quite complex today: inputs range from stereo to Higher-Order Ambisonics (HOA) and dynamic objects, while outputs vary from headphones to immersive multi-speaker rooms.
The same audio mix may need to play on:

a phone with headphones (binaural, optionally with head tracking),
a laptop (stereo downmix with coherent spatial cues),
a living-room with classical 5.1 or 7.1 setups,
or a full 7.1.4 or 9.x.x immersive speaker layout.

libspatialaudio is precisely this rendering step. Independent of the audio codec, it provides the rendering logic to take spatial audio representations (Higher-Order Ambisonics, Objects, Direct speaker) and produce a target output that matches the user’s device and speaker layout, in real time, with an API that can be embedded in players, streaming stacks or transcoding toolchains.

The library also supports binauralization with custom HRTF files, and it is structured so you can feed multiple spatial stream types and request one target output layout.

libspatialaudio is, of course open-source, cross-platform and written in C++, like so many other VideoLAN/FFmpeg related projects.

History

libspatialaudio started as a fork of ambisonic-lib by Aristotel Digenis, that we modified significantly when we adopted it for spatial audio rendering in VLC in 2017.

At that time, Ambisonics was the focus for all the VR and Cardboard setups, where head rotation and sound-field manipulation are core needs, before that 3D/VR trend died.

Over time, we grew the library beyond ambisonics into a full spatial audio renderer, adding support for ADM-style concepts, full support for objects rendering, while still supporting HOA or direct speakers.
It became complete for all next-gen immersive audio distribution and playback, including formats like IAMF.

The project lives here: https://github.com/videolabs/libspatialaudio/.

Architecture overview: four building blocks

I will outline the broad architecture of the library here; but you should really read the details in the technical report.

1) Higher-Order Ambisonics (HOA): encode → transform → decode

Ambisonics is the “sound field” approach: sources are encoded into spherical-harmonic components, optionally transformed (rotation, zoom), then decoded to a target playback layout.

libspatialaudio provides the classic set of processors (Encoder, Rotator, Zoomer, Decoders), supporting up to 3rd order, i.e. 16 channels.

A few implementation details matter in production:

The HOA signals use the AmbiX conventions (ACN channel ordering and SN3D normalization), with conversion from other conventions handled as needed in a pipeline;
Real-time updates require smoothing: the encoder uses gain interpolation to reduce “zipper noise” during motion;
For decoding to irregular or standard layouts, AllRAD is a key method: decode to a regular virtual layout and then pan to the real layout via VBAP. This is a practical bridge between HOA and “real” loudspeaker setups.

For head tracking, the “rotate in the HOA domain” approach is typically what you want: rotate the sound field once, after summing sources, rather than per-source.

Finally, note that libspatialaudio can also encode HOA, which is useful when building a sound field from discrete sources.

2) Loudspeaker binauralization: multichannel → headphone rendering

For headphones, libspatialaudio offers binaural rendering via HRIR/HRTF convolution:

Each loudspeaker channel is convolved with a left/right HRIR pair corresponding to its direction, then summed;
The implementation uses frequency-domain convolution (overlap-add);
It can load HRTFs in SOFA form via libmysofa, with a built-in option (MIT HRTF) available as a default.

If you can, you should load your own HRTF for optimal audio rendering. Yes, I know it’s hard to get a custom SOFA file :)

3) Object spatialisation: mono source → loudspeaker gains

Object-based audio is the “source + metadata” approach: each object is a mono signal associated with a position and potentially additional parameters (width, divergence, diffuseness, etc.).

libspatialaudio provides:

a lower-level ObjectPanner implementing VBAP (Vector Base Amplitude Panning), and
a higher-level Renderer path that supports metadata features used by modern renderers (channel locking, divergence, spread/extent, zone exclusion, direct/diffuse energy split, decorrelation for diffuse components).

This allows full object rendering suitable for Next-Gen Audio object workflows, like ADM Object, IAMFv2 or potentially, other formats like MPEG-H or Dolby Atmos.

4) Metadata-based rendering: the `Renderer` class

The Renderer is the unifying layer. It accepts multiple stream types:

DirectSpeaker streams (target a specific speaker if present; otherwise spatialise appropriately);
HOA streams (summed and decoded);
Objects (mono + metadata).

It targets a variety of loudspeaker layouts, including common ITU-style systems and new additional layouts for immersive playback.

One notable design choice is the binaural metadata rendering path: instead of requiring a large BRIR dataset, libspatialaudio renders objects/direct speakers to a virtual loudspeaker layout, re-encodes those signals to Ambisonics, sums with HOA, applies head rotation in the HOA domain, then decodes to binaural. This keeps computation tractable and integration simpler, with a clear and explicit trade-off compared to BRIR-heavy approaches.

What’s new in libspatialaudio 0.4.0

The release motto could be: “libspatialaudio grows from a HOA-focused library into a general-purpose spatial audio renderer with a unified API”.

This is the first production release where the library, as a whole, is meant to cover the full set of use cases for rendering immersive audio.

A stronger focus on IAMF

If ADM has been the traditional open reference point for broadcast and professional interchange, IAMF (Immersive Audio Model and Formats) is an open specification aimed at internet distribution, developed within the Alliance of Open Media.

IAMF’s key appeal is that it standardises packaging and rendering intent for immersive audio in a way that fits streaming and broad deployment, while avoiding proprietary ecosystems.

This format is a great fit for libspatialaudio and VideoLAN.
The renewal of Next-Gen-Audio (NGA), of spatial audio with new soundbars, AirPods-like headphones means that we’re going to see more and more spatial audio formats, and IAMF fits the need of having a metadata format, agnostic of the codecs.

Libspatialaudio has been modified to be sure to be compatible with the features of IAMF (v1, v1.1 and the upcoming v2).
In addition, the AllRAD/VBAP-based decoding and the metadata-aware object rendering path are a good match for the “rendering anywhere” deployment model IAMF is designed for.

Functional highlights and ABI

Object-based audio support, with a renderer designed for metadata-driven immersive workflows.
A unified Renderer that handles HOA, objects, direct speakers, and binaural output under a single integration point.
HOA improvements, including decoding to common standard loudspeaker layouts.
Real-time and efficiency improvements, focusing on practical use in players and interactive systems.
Documentation upgrades, with more DSP background and examples.
API and ABI are broken versus previous releases:
- classes were renamed and moved into the spaudio namespace,
- headers were reorganised to avoid conflicts and improve usability
Meson support is now available as a first-class build option.

If you integrate libspatialaudio in an existing codebase, expect to touch include paths, namespaces, and some class names. In exchange, the API surface is cleaner, more neutral with regards to the format and the Renderer becomes the obvious entry point for most pipelines.

License

libspatialaudio is LGPL v2.1 (or later) and is also available under a commercial license.

Conclusion

libspatialaudio 0.4 is not a minor bump: it is the first major production release of the project and focuses on a unified renderer that can handle HOA, objects, and direct speakers, and output to a wide range of speaker layouts, including binauralization. It is great for Ambisonics, Objects, ADM, and IAMF-style pipelines but also for spatialising more classical inputs in a predictable way.

We are waiting for your feedback, and patches.

libspatialaudio 0.4: a modern spatial audio library

libspatialaudio 0.4: a modern spatial audio library

History

Architecture overview: four building blocks

1) Higher-Order Ambisonics (HOA): encode → transform → decode

2) Loudspeaker binauralization: multichannel → headphone rendering

3) Object spatialisation: mono source → loudspeaker gains

4) Metadata-based rendering: the `Renderer` class

What’s new in libspatialaudio 0.4.0

A stronger focus on IAMF

Functional highlights and ABI

License

Conclusion

News

dav1d 1.5 Sonic

libdvdnav / libdvdread 7 & libdvdcss 1.5 release: DVD-Audio & more