Spatial Audio for Headphones using Ambisonics
Updated: May 11
The piece I am working on for this project uses audio spatialization as a compositional element. A conventional stereo field offers two channels, one left and one right, between which audio can be panned. With multi-channel sound systems it is possible to route individual channels in a DAW (Digital Audio Workstation) to separate speakers and create a more immersive audio experience than with stereo. With sound coming not only from front, left and right, but also from behind, above and below, the listener finds herself in the center of a spherical audio field.
One technology which solves the practical difficulties of panning individual channels across multiple speakers is called Ambisonics. It is a full-sphere surround sound format which was developed already in the 1970s in the UK. It is a set of techniques for reconstructing a completely immersive sound field that emulates the way we hear naturally. Our brain identifies the directionality and location of audio by detecting the subtle differences between sound waves as they arrive at each ear. Ambisonics models these psychoacoustic principles digitally to create the perception of sound directionality. The sound field reconstruction techniques have over the years evolved to a high-fidelity format called higher-order ambisonics that can virtually position sound in 3D space. Quite surprisingly, it is not until recently, with the increased popularity of VR, 360 sound gaming and Youtube content that ambisonics has taken off as a commercial success. Earlier it existed merely in niche applications and among recording enthusiasts.
Unlike other multichannel surround formats, ambisonics' transmission channels do not carry speaker signals. Instead, they contain a speaker-independent representation of a sound field (sphere), in which individual sounds can be panned, which is then decoded to the listener's speaker setup or decoded to a two-track binaural audio file for headphones. This extra step of panning in a field rather than to specific speakers, allows the producer to think in terms of source directions rather than loudspeaker positions, and offers the listener a considerable degree of flexibility as to the layout and number of speakers used for playback. In our case, we will decode to binaural. And what exactly is binaural audio?
It is essentially digital manipulation of audio to mimic the way we hear things naturally, in 3D. Since we only have two ears, more than two channels are ultimately not needed to give us 3D sound - if using headphones. Binaural audio can be captured and played back by recordings from so called binaural microphones. These use a stereo pair of microphones, one in each ear of a dummy head. This way sound is recorded very similar to the way we would hear it. And when played back in a pair of headphones the brain is tricked into perceiving these sounds as if happening first hand in a surrounding 3D space. This effect can also be obtained by panning signals using ambisonics plugins in a DAW. The binaural decoder then processes the sounds using so called head-related transfer functions (HRTF) to mimic natural sound waves emanating from a point in a 3D space.
One such plugin suite for ambisonics audio is the free and open-source Max for Live devices Envelop For Live (E4L), for Ableton Live 10. These plugins bypass Ableton's standard stereo signal routing and process 16-channels of audio in the so called third order ambisonics domain. A source panner is placed on each track and the sound can then be panned within the 3D field either as a mono or a stereo signal. The coordinates of the sound can also be automated or moved using LFOs. More information about how to use the E4L devices can be found here.
The E4L devices Source Panner and Multi Delay placed on a track in Ableton Live 10.
One feature that I appreciate with the E4L suite is the ease at which you can work with stereo signals within the 3D field. All the sounds I intend to use for this piece have been recorded with a stereo field recorder and it would be a pity to make them mono. Working with stereo signals, where you can also dynamically alter the spread of the stereo image, within the 3D field, will hopefully provide even more depth and nuance to the sound experience.
In order to render a binaural audio file, I would choose the settings shown in the image below and then simply render the E4L Master Bus track to a wav file. The output routing settings can all be left at 'No Output'.
As previously stated, binaural 3D audio has gained in popularity with VR, gaming as well as the somewhat obscure ASMR Youtube videos. However, in the context of spatial sound installations, sound art and music in general it seems a bit under utilized. Personally, I find the combination of ambisonics encoding and binaural decoding very exciting and inspiring, especially since headphones is such a common way of listening to recordings these days. And thanks to the speaker-independent nature of ambisonics production, a piece easily scales up and down to different speaker setups, making it possible to exhibit a piece in different venues without having to make major changes to the mix.