MDCT

The specific maths tool most music formats use to turn a slice of sound into its list of frequencies without clicks at the joins.

MDCT slices audio into 50%-overlapping windowed blocks, reads off N frequency bins per block, then overlap-adds so TDAC cancels the joins with no click.

What it is

The maths tool inside MP3, AAC and AC-3 that turns overlapping slices of audio into frequency lists without clicks at the joins.

Key facts

MDCT = Modified Discrete Cosine Transform; a lapped transform (blocks overlap, not butt-joined).
Formula: X(k) = sum n=0..2N-1 of x(n) * cos[ (pi/N)(n + 0.5 + N/2)(k + 0.5) ]. x(n)=input samples, X(k)=output coefficients, N=number of bins.
50% overlap rule: each block takes 2N input samples but outputs only N coefficients (a 2:1 transform).
AAC-LC: N=1024 long block (2048-sample window) or N=128 short block (256-sample window) for transients. AC-3: N=256 long.
MP3 (Layer III): N=18 long or N=6 short per sub-band, after a 32-band filterbank = 576 frequency lines per granule.
Reconstruction uses TDAC (Time-Domain Aliasing Cancellation): overlap-add the two halves and the aliasing cancels exactly = click-free perfect reconstruction.
Window must satisfy the Princen-Bradley condition: w(n)^2 + w(n+N)^2 = 1, so overlapping windows sum to unity. Common windows: sine or KBD (Kaiser-Bessel Derived).
Block switching: codecs drop to short blocks on transients (drum hits, claps) to avoid pre-echo smearing energy ~N samples backwards in time.
At 44.1 kHz an AAC 1024-point long block spans ~23.2 ms; a 128-point short block ~2.9 ms.
Long-block frequency resolution at 44.1 kHz = 44100 / 2048 = ~21.5 Hz per bin; MDCT is energy-compacting, packing most energy into few coefficients (this shrinks the file).

How it works

Slice audio into overlapping blocks, each sharing 50% (half) its samples with the next block.
Multiply each block by a smooth window (sine or KBD) so the edges taper to near zero.
Run the MDCT cosine sum: convert 2N time samples into N frequency coefficients.
Quantise and throw away the quiet/masked coefficients (this is where compression happens).
Decoder runs the inverse MDCT to get overlapping time blocks back.
Overlap-add the blocks; TDAC cancels the aliasing so the joins are seamless, no clicks.

Real examples

Every MP3 you drag into a playback laptop was built block-by-block with MDCT.
Spotify, Apple Music and YouTube streams run AAC-LC MDCT (1024/128 blocks).
A movie's Dolby Digital (AC-3) surround track uses 256-point MDCT.
A snare hit triggers block switching to 128-point short blocks to kill pre-echo.
Bluetooth's AAC and the newer LC3 codec both lean on MDCT-style transforms.

How it helps in live sound

Use WAV/FLAC (lossless, no MDCT loss) for show playback and walk-in beds; keep MP3/AAC for reference only.
If you must use compressed, go 320 kbps MP3 or 256 kbps+ AAC; pre-echo on transients gets audible below ~192 kbps.
Never re-encode twice (MP3 -> AAC); each MDCT pass re-quantises and stacks artefacts.
Trust your ears on cymbal/clap-heavy tracks; low-bitrate MDCT smears transients and softens snap.
Carry a lossless master of every track on USB and the laptop; codec artefacts get exposed on a big PA.

Everyday analogy

Like cross-fading film scenes by overlapping the cut so the picture never jumps, then the two overlaps add back to one clean shot.

Watch out

Myth: the overlap wastes data or doubles the file. Truth: MDCT outputs only N coefficients from 2N samples (2:1), and TDAC makes the overlap free, with zero clicks.

Fun fact

A single MDCT block is mathematically non-invertible on its own (it throws away half the data as time aliasing). Only when you overlap-add the neighbouring block does the aliasing cancel and the audio reappear perfectly.

Key takeaways

MDCT is the frequency engine inside MP3, AAC and AC-3.
Blocks overlap 50%; 2N samples in, N coefficients out.
TDAC plus a Princen-Bradley window = seamless, click-free joins.
Short blocks on transients stop pre-echo; long blocks for steady tones.
For live playback prefer lossless; MDCT loss is permanent and stacks if re-encoded.

← Previous

Transform Coding

Bit Allocation Theory

☰ All 123 concepts