It is the personal way your head, ears and shoulders reshape a sound before it reaches your eardrum, which tells your brain where the sound is.
A sound from an angle hits the near ear first and louder, the pinna notches its highs, and the brain reads that personal fingerprint as a direction.
What it is
The unique tone-colour fingerprint your ear flaps, head and shoulders stamp on a sound so your brain knows its direction.
Key facts
HRTF = Head-Related Transfer Function: the filter (frequency + time changes) from sound source to eardrum for each direction.
Speed of sound in air ~343 m/s at 20 degC (about 1235 km/h).
ITD = Interaural Time Difference: max ~0.6-0.65 ms; sound hits the near ear first. Dominant cue below ~1.5 kHz.
ILD = Interaural Level Difference: head shadows the far ear, up to ~20 dB at high frequencies. Dominant cue above ~1.5-2 kHz.
Duplex theory (Lord Rayleigh, 1907): low freqs located by ITD, high freqs by ILD.
Average adult head ~17.5 cm ear-to-ear; that ~21.5 cm extra path sets the ~0.65 ms max ITD.
Pinna (ear flap) notches/peaks live ~4-16 kHz; these spectral cues give UP/DOWN and FRONT/BACK.
Cone of confusion: ITD/ILD alone can't tell front from back or up from down. Pinna cues + small head turns resolve it.
Wavelength: lambda = c / f. At 343 m/s, 1 kHz = 0.343 m, so a ~17.5 cm head only shadows well above ~1 kHz.
HRTFs are personal: ears differ like fingerprints, so generic HRTFs cause front-back errors and 'in-head' sound.
How it works
Sound leaves a source at an angle (azimuth = left/right, elevation = up/down).
It reaches the near ear first and louder; the head shadows the far ear (ITD + ILD).
Your pinna folds reflect and notch the high frequencies based on the up/down/front/back angle.
Shoulders and torso add a second, slightly delayed reflection.
The eardrum gets the filtered version; the brain compares both ears + the spectral notches.
Brain matches that fingerprint to a learned direction = you 'hear' where it is.
Real examples
Closing one eye and ear: you can still point at a clap because of ITD/ILD between your ears.
Binaural 'virtual barbershop' recordings made with a dummy head (e.g. KEMAR) make scissors feel like they circle your real head.
Apple/Dolby head-tracked 'Spatialized' audio applies an HRTF so dialogue stays locked to the screen as you turn.
Gaming headsets (Dolby Atmos for Headphones, DTS) fake rear/overhead enemies through HRTF filtering of stereo cans.
A sound directly behind vs in front can fool you (cone of confusion) until you tilt your head.
How it helps in live sound
For headphone monitor mixes / IEMs, an HRTF/binaural plugin (Waves Nx, dSONIQ Realphones) makes cans feel like real speakers, reducing ear fatigue.
Dummy-head (binaural) capture for immersive content must be played on HEADPHONES, not a PA, or the cues collapse.
On a real PA you cannot deliver HRTF height cues by EQ; for true height use actual overhead/immersive arrays (L-ISA, d&b Soundscape).
Personalise where possible: generic HRTFs cause front-back flips, so let critical listeners pick a profile or measure their own.
Head-tracking is mandatory for convincing 3D over headphones; without it the scene rotates with the listener and breaks.
Mind the cone of confusion in show design: a single rear effect can read as front, so add motion or a visual cue to lock it.
Everyday analogy
Like recognising who is knocking by the exact muffled sound through your own front door versus your window, your brain reads a sound's direction from the personal way your ears and head colour it.
Watch out
Myth: surround headphones have tiny extra speakers for each channel. Reality: they have 2 drivers and an HRTF DSP filter that fakes direction by reshaping the sound like your ears would.
Fun fact
Your brain re-learns your HRTF if your ears change shape: in studies, people fitted with ear moulds lost up/down localisation, then re-learned it within weeks, and kept BOTH the old and new maps.
Key takeaways
HRTF = the personal direction-filter of your head, ears and shoulders.
Two ears give left/right (ITD below 1.5 kHz, ILD above); pinna gives up/down and front/back.
Max ear-to-ear time gap is only ~0.65 ms, yet that is enough to point at a sound.
3D headphone audio is just HRTF maths faking the cues your real ears would make.
Generic HRTFs cause errors; personal HRTFs + head-tracking make it convincing.