If a second copy of a sound arrives within roughly the first 40 milliseconds, you hear one louder sound, not an echo.
A second copy under ~40 ms fuses into one sound, located at whichever arrives first.
What it is
If a second copy of a sound arrives within about the first 40 ms, you hear one fused sound, not an echo.
Key facts
Fusion window: roughly 1-40 ms delay = heard as ONE sound, not two.
Localisation locks to the FIRST-arriving wavefront, even if the later copy is up to ~10 dB LOUDER.
Echo threshold: past ~40-50 ms (speech) you start hearing a distinct echo / slap.
Music tolerates longer (~50-80 ms) before splitting; clicks/transients split as early as ~5 ms.
Speed of sound in air = 343 m/s at 20 C (about 1130 ft/s).
Distance per ms of delay = 0.343 m (34.3 cm) per millisecond.
Delay time formula: time (ms) = distance (m) divided by 0.343 (or distance in metres x 2.91).
Below ~1 ms the two copies sum into comb filtering and image shift, NOT Haas fusion.
+6 dB = doubling sound pressure (2x voltage); +10 dB = roughly 2x perceived loudness.
-3 dB = half the power (half-power point); -6 dB = half the pressure.
How it works
Take a sound and a copy of it from a second source.
Delay the copy by about 1 to 40 ms.
Brain fuses the two into a single sound (precedence effect).
Perceived location snaps to whichever arrives FIRST.
Image can be pulled toward the first speaker even if the second is louder.
Push delay past ~40-50 ms and the copy splits off as an audible echo.
Real examples
Mono vocal copied to a second speaker 15 ms late: image still snaps to the first speaker, not the centre.
Stereo widener plugin: pan one side, delay the copy 10-25 ms to fatten a guitar or synth.
Delay-tower / fill speaker timed so the audience still hears the sound as coming from the stage.
Slap clap in a small tiled room: reflections under 40 ms make one big clap, not an echo.
Doubling a DI bass with a 20 ms delayed copy to add width without obvious repeat.
How it helps in live sound
Delay/fill towers: add (distance in metres x 2.91) ms so the fill fires AFTER the main, keeping the image on stage.
Add a small extra 5-15 ms 'Haas offset' on top of physical delay so the audience localises to the stage, not the nearest box.
Stereo width: duplicate a mono track, pan hard L/R, delay one side 10-30 ms (watch mono fold-down for comb filtering).
Centre-cluster vocal louder but timed first keeps the vocal image centred over wide L/R PA.
Keep all Haas delays under ~40 ms or you get slapback; check the worst seat, not just front-of-house.
Speed of sound shifts ~0.6 m/s per degree C, so re-time delays if temperature swings a lot outdoors.
Everyday analogy
Clapping once in a tiled bathroom: the quick reflections do not sound like extra claps, they just make the one clap sound bigger and closer to where it started.
Watch out
Myth: the louder speaker always wins the image. Truth: under ~40 ms the FIRST-arriving sound wins localisation even if the second is up to ~10 dB louder.
Fun fact
A delayed copy can be up to about 10 dB LOUDER than the original yet your brain still places the sound at the quieter, first-arriving speaker.
Key takeaways
First wavefront wins: you localise to whatever sound arrives first.
Under ~40 ms = one fused sound; over ~40-50 ms = audible echo.
Sound travels 34.3 cm per millisecond (343 m/s at 20 C).
Louder does not beat earlier inside the fusion window (up to ~10 dB).
Use it to widen or steer a PA without creating slapback.