Binaural Audio for Headphones using Ambisonics and Envelop for Live
Updated: Jun 10, 2020
The piece I am working on for this project uses audio spatialization as a key compositional element. A conventional stereo field offers two channels, one left and one right, within which audio can be panned. With multi-channel sound systems it is possible to route individual channels in a DAW (Digital Audio Workstation) to separate speakers and create a more immersive audio experience. With sound coming not only from front, left and right, but also from behind, and possibly from above and below, the listeners find themselves in the center of a circle or a spherical audio field.
One technology which solves the practical difficulties of panning individual channels across multiple speakers is called Ambisonics. It is a full-sphere surround sound format which was developed already in the 1970s in the UK. It is a set of techniques for reconstructing a completely immersive sound field that emulates the way we hear naturally. Our brain identifies the directionality and location of audio by detecting the subtle differences between sound waves as they arrive at each ear. Ambisonics models these psychoacoustic principles digitally to create the perception of sound directionality. The sound field reconstruction techniques have over the years evolved to a high-fidelity format called higher-order ambisonics that can virtually position sound in a 3D space. Rather surprisingly, it is not until recently, with the increased popularity of VR, 360 sound gaming and Youtube content that ambisonics has taken off as a commercial success. Earlier it existed merely in niche applications and among recording enthusiasts.
Unlike other multichannel surround formats, ambisonics' transmission channels do not carry speaker signals. Instead, they contain a speaker-independent representation of a sound field (sphere), in which individual sounds can be panned, which is then decoded to the listener's multi-speaker setup or decoded to a two-track binaural audio file for headphones. This extra step of panning in a field rather than to specific speakers, allows the producer to think in terms of source directions rather than loudspeaker positions, and offers the listener a considerable degree of flexibility as to the layout and number of speakers used for playback. In our case, we will decode to binaural.
And what exactly is binaural audio? It is essentially digital manipulation of audio to mimic the way we hear things naturally, in 3D. Since we only have two ears, more than two channels are ultimately not needed to give us 3D sound - if using headphones. Binaural audio can be captured and played back by recordings from a so called binaural microphone. These use a stereo pair of microphones, one in each ear of a dummy head. This way sound is recorded very similar to the way we would hear it. And when played back in a pair of headphones the brain is tricked into perceiving these sounds as if happening first hand in a surrounding 3D space. This effect can also be obtained by panning signals using ambisonics plugins in a DAW. The binaural decoder then processes the sounds using so called head-related transfer functions (HRTF) to mimic natural sound waves emanating from a point in a 3D space.
One such plugin suite for ambisonics audio is the free and open-source Max for Live devices Envelop For Live (E4L), for Ableton Live 10. These plugins bypass Ableton's standard stereo signal routing and process 16-channels of audio in the so called third order ambisonics domain. A source panner is placed on each track and the sound can then be panned within the 3D field either as a mono or a stereo signal. The coordinates of the sound can also be automated or moved using LFOs. More information about how to use the E4L devices can be found here.
The E4L devices Source Panner and Multi Delay placed on a track in Ableton Live 10.
One feature that I appreciate with the E4L suite is the ease at which you can work with stereo signals within the 3D field. All the sounds I intend to use for this piece have been recorded with a stereo field recorder and it would be a pity to make them mono. Working with stereo signals, where you can also dynamically alter the spread of the stereo image, within the 3D field, will hopefully provide even more depth and nuance to the sound experience.
In order to render a binaural audio file, I would choose the settings shown in the image below and then simply render the E4L Master Bus track to a wav file. The output routing settings can all be left at 'No Output'.
As previously stated, binaural 3D audio has gained in popularity with VR, gaming as well as the somewhat obscure ASMR Youtube videos. However, in the context of spatial sound installations, sound art and music in general, it seems a bit under utilized. Personally, I find the combination of ambisonics encoding and binaural decoding very exciting, especially since headphones is such a common way of listening to recordings these days. And thanks to the speaker-independent nature of ambisonics production, a piece easily scales to different speaker setups, making it possible to exhibit a piece in different venues with only minor or no adjustments to the mix.
In case you were previously unfamiliar with ambisonics and binaural audio, I hope this post has provided an introduction and perhaps inspired you to start experimenting with spatial audio.