Convolution Theorem

It is a rule that says blending two sounds together is the same as multiplying their frequency lists, which is far easier.

Convolution in time (slow N² smear) becomes a quick FFT, spectrum multiply, then IFFT.

What it is

Convolution Theorem: blending two signals in time equals just multiplying their frequency spectra, point by point.

Key facts

Formula: x(t) * h(t) = IFFT( X(f) x H(f) ), where * is convolution and x is plain multiply.
Symbols: x(t) = your audio in time, h(t) = the impulse response, X(f)/H(f) = their spectra (FFT outputs), f = frequency in Hz.
Direct convolution cost = O(N^2) multiply-adds; FFT route = O(N log N). At N=65,536 that's ~4.3 billion vs ~1 million ops, roughly 4000x fewer.
FFT (Fast Fourier Transform) needs ~N log2(N) operations; the multiply step in the middle is just N multiplies.
Reverse rule (duality): multiply in time = convolve in frequency. The two domains swap operations.
An impulse response (IR) is the reverb 'fingerprint' captured by firing one click (a Dirac impulse) into a real room or hardware.
Convolving your dry signal with a hall IR = the sound 'as if' played in that hall. That IS convolution reverb.
Standard sample rates: 44.1 kHz (CD), 48 kHz (video/live), 96 kHz (hi-res). 48 kHz = 48,000 samples every second per channel.
Nyquist limit = sample_rate / 2: 48 kHz captures up to 24 kHz; 44.1 kHz up to 22.05 kHz. Human hearing tops out ~20 kHz.
Latency dodge: 'partitioned convolution' splits a long IR into small FFT blocks so reverb runs live with only a few ms delay.

How it works

Take your dry audio and the impulse response (IR).
Run an FFT on each to get two frequency lists, X(f) and H(f).
Multiply the two lists together, bin by bin (complex multiply).
Run an inverse FFT (IFFT) on the result to get audio back.
Out comes the dry sound smeared with the IR's room and tone.
For live use, chop the IR into blocks (partitioned) to keep latency low.

Real examples

Convolution reverb: drop in a Sydney Opera House IR and your vocal sounds like it's in that hall.
Speaker/cab sims: convolve a DI guitar with a mic'd cabinet IR for amp-in-a-box tone.
Room correction (e.g. Dirac, Sonarworks): convolve playback with a correction IR to flatten room bumps.
FIR EQ: a linear-phase EQ curve is just an IR you convolve your signal with.
Game audio: real-time reverb on footsteps using tiny partitioned IRs so the CPU doesn't melt.

How it helps in live sound

Convolution reverb (Waves IR1, Altiverb, LiquidSonics) gives real venue tails dry algorithmic units can't fake.
Use partitioned/low-latency mode for monitors so reverb adds only a few ms, not 40+ ms.
Load a measured IR of YOUR actual room to model how a mix will sound front-of-house.
Capture cabinet IRs (e.g. with a Two Notes Torpedo) so DI'd guitars/bass need no mic on stage.
Longer IR tails = more CPU; trim the IR length to cut load when channels run out.
Linear-phase FIR EQ = zero phase smear, but adds latency; avoid on live monitor sends.

Everyday analogy

It is like multiplying two big numbers by adding their logs instead: you swap a hard job for an easy one, then convert back.

Watch out

Myth: convolution reverb 'mixes in' a reverb sound. Truth: it stamps your signal with a room's full impulse response via spectrum multiply, recreating that exact space.

Fun fact

A full impulse response of a cathedral can be 5+ seconds (over 240,000 samples at 48 kHz), yet the multiply trick lets a laptop convolve it in real time.

Key takeaways

Convolution in time = multiplication in frequency. Same result, far less work.
FFT -> multiply -> IFFT is the shortcut: O(N log N) beats O(N^2).
Impulse response (IR) = the captured 'fingerprint' of a room, speaker or EQ.
This theorem is WHY convolution reverb and cab sims run live without melting the CPU.
Duality: multiply in time = convolve in frequency; the domains swap jobs.