Binaural Rendering

Faking real 3D sound over ordinary headphones by recreating the exact cues your two ears would naturally receive.

One mono source becomes two different ear signals: the near ear hears it first and louder (ITD + ILD), then your brain rebuilds the 3D position.

What it is

Faking true 3D sound over normal headphones by recreating the exact timing, loudness and ear-shape cues your two ears would receive in real life.

Key facts

HRTF = Head-Related Transfer Function: the per-direction filter your head, ears and torso apply to sound; unique to each person.
Speed of sound in air = 343 m/s at 20 degrees C (about 1235 km/h).
ITD = Interaural Time Difference: arrival-time gap between ears, max ~0.65 ms (650 microseconds) when sound is 90 degrees to one side.
ILD = Interaural Level Difference: loudness gap between ears, up to ~20 dB at high frequencies from head shadow.
Duplex Theory (Rayleigh): ITD dominates localisation below ~1.5 kHz, ILD dominates above ~1.5 kHz.
Pinna (outer-ear) notches/peaks in the 4-16 kHz band cue up/down and front/back; head width ~17-22 cm sets the ITD.
Binaural ONLY works on isolated headphones - speakers leak L into R (crosstalk) and destroy the per-ear cues.
+6 dB = double the sound pressure; -3 dB = half the power; inverse-square law drops level 6 dB per doubling of distance.
Convolution: each ear output = source audio convolved with that direction's HRIR (Head-Related Impulse Response); datasets include MIT KEMAR, CIPIC, SADIE II.
Head-tracking (Apple Spatial Audio, Dolby Atmos for Headphones) updates the HRTF in real time so the scene stays locked to the world.

How it works

Capture or model how each ear colours a sound from every direction (the HRTF).
Take a mono source and split it into a left and right ear path.
Apply the correct timing gap (ITD) and loudness gap (ILD) for the target direction.
Convolve each ear's signal with that direction's HRIR (head + pinna filtering).
Add early reflections and reverb for distance and room feel.
Feed L to left cup, R to right cup; add head-tracking so the scene stays fixed when the head turns.

Real examples

VR/AR audio - a footstep behind you in a headset sounds truly behind you.
Apple Spatial Audio and Dolby Atmos music rendered to AirPods with head-tracking.
ASMR and 3D YouTube videos recorded on a Neumann dummy head (KU 100).
Game engines (Steam Audio, Oculus Audio) doing real-time HRTF.
Virtual surround - faking a 7.1 cinema mix down to two headphone drivers.

How it helps in live sound

Binaural is headphones only - it will NOT translate to a PA or wedges, so never mix the FOH show in binaural.
Use it for IEM/in-ear monitor 'virtual stage' mixes so performers hear bandmates positioned around them.
Offer binaural headphone streams for at-home/overflow audiences at hybrid events.
Record key moments on a dummy-head mic for immersive social/marketing clips.
Demo spatial mixes to clients on headphones, but A/B against the real speaker mix before sign-off.
Watch front/back confusion with generic HRTFs - add head-tracking or personalised profiles if budget allows.

Everyday analogy

It's like fitting each ear with its own tiny stethoscope that hears exactly what a real ear would, so your brain swears the sound is genuinely behind or above you.

Watch out

Myth: binaural makes any playback 3D. Truth: it collapses to flat or weird on loudspeakers because crosstalk destroys the per-ear cues - it only works through isolated headphones.

Fun fact

Your brain reads a timing gap as small as ~10 microseconds (millionths of a second) between your ears to place a sound left or right - faster than any drum hit you can consciously perceive.

Key takeaways

Binaural = recreate the exact two cues each ear gets: timing (ITD) and loudness (ILD), plus ear-shape (HRTF) colouring.
HRTF/HRIR is the per-direction filter of head + pinna + torso; convolve audio with it to place a sound.
Headphones mandatory - speaker crosstalk kills the effect.
Pinna notches at 4-16 kHz cue up/down and front/back; head-tracking fixes front/back confusion.
Backbone of VR, 3D video and spatial music; great for IEMs and headphone streams, useless for the PA.