Auditory Scene Analysis

How your brain sorts one messy wall of sound into separate things you can recognise.

One summed pressure wave enters the ear; the brain uses harmonicity, common onset and direction to rebuild it into separate kick, voice and guitar streams.

What it is

Your brain's automatic process of splitting one mixed sound wave into separate, recognisable sources like voice, kick and guitar.

Key facts

One eardrum gets ONE summed pressure wave; the brain reverse-engineers it into many sources (the 'cocktail party' problem, Colin Cherry 1953)
Term 'Auditory Scene Analysis' (ASA) from Albert Bregman, 1990 book of that name
Grouping splits into SIMULTANEOUS (vertical, what's playing now) and SEQUENTIAL (horizontal, streaming over time)
Harmonicity cue: partials that are whole-number multiples of one fundamental (f0, 2f0, 3f0...) FUSE into one source; mistune one ~3% and it pops out
Common onset: partials starting together (within ~30-40 ms) fuse; pitch split ~3-6 semitones at fast tempo breaks one line into two streams
Direction from ITD (interaural time difference, up to ~0.6-0.7 ms) and ILD (interaural level difference) between the two ears
Precedence (Haas) effect: a copy delayed 1-35 ms is heard as ONE event at the FIRST arrival; beyond ~35-50 ms it's a separate echo
Speed of sound ~343 m/s at 20 C; a 1 ms delay = ~34 cm path difference
+6 dB SPL = double the PRESSURE (2x voltage); +3 dB = double the POWER; doubling distance drops 6 dB (inverse-square law)
Cochlea has ~24 critical bands (Bark scale); sources in the same band MASK each other, so spreading EQ separates them

How it works

Ears send one summed waveform; cochlea splits it by frequency into critical bands.
Brain groups partials that share a fundamental (harmonicity) into one source.
Bits that start and stop together (common onset) get fused into the same object.
Pitch, timbre and direction link sounds over time into separate 'streams'.
ITD and ILD between ears assign each stream a location in space.
You consciously hear distinct objects: voice here, kick there, guitar over there.

Real examples

At a party you follow one talker and ignore the rest (the cocktail party effect).
A choir sounds like many voices, not one, because each singer's pitch wobbles independently.
Mistuning one violin in a unison makes that player suddenly stand out from the section.
Pan a hi-hat right and a shaker left and the ear hears two clear parts, not mush.
A 20 ms slap-back makes a voice sound bigger yet still one source, thanks to the precedence effect.

How it helps in live sound

Give each source its own EQ lane: HPF vocals ~80-120 Hz, kick owns 50-100 Hz, so they stop sharing critical bands.
Use PANNING to hand sources different ITD/ILD positions; mono-piling everything kills separation.
Set delay/reverb pre-delay 15-35 ms so effects fatten without becoming a separate echo (precedence effect).
Keep stage-fill and delay-tower timing inside ~35 ms of the mains or the audience hears a slapback, not one source.
Carve a 2-4 kHz vocal pocket: dip guitars/synths there so the voice streams out front clearly.
Two sources fighting in one Bark band mask each other, so notch one and let the other own it.

Everyday analogy

Like glancing at a crowded room and instantly seeing separate people instead of a coloured blur, your brain instantly carves one wall of sound into a voice, a drum and a guitar.

Watch out

Myth: louder = clearer. Truth: clarity comes from SEPARATION (distinct EQ band, pan position and timing) not volume; cranking a buried source just masks everything else.

Fun fact

The precedence effect means a sound copied to your other ear up to ~35 ms later is still heard as ONE source from the first ear, the trick behind why a stereo PA and delay towers don't sound like an echo chamber.

Key takeaways

One waveform hits the ear; the brain rebuilds many sources from it.
Fusion cues: shared fundamental, common onset, common movement.
Streaming cues: pitch, timbre and direction link sounds over time.
Location comes from ITD (timing) and ILD (level) between your two ears.
Mix to HELP it: unique EQ band + pan position + timing per source.

← Previous

Critical Bands

Auditory Stream Segregation

☰ All 123 concepts