A trick that rewrites the sound as a list of which pitches are present instead of a raw wiggle of the speaker.
Raw wave to frequency recipe via MDCT, then masked bars dropped - that sort-and-drop is how MP3/AAC squeeze audio.
What it is
Transform coding chops audio into short chunks and rewrites each chunk as a list of frequencies present, not raw samples.
Key facts
Transforms a BLOCK of time-samples into a list of frequency coefficients, then quantises/drops the inaudible ones
MP3 uses MDCT (Modified Discrete Cosine Transform), giving 576 frequency lines per granule (long) or 192 per short window
AAC uses MDCT on 1024-sample long blocks or 128-sample short blocks
MDCT is critically sampled: N time samples in to N/2 unique coefficients out (50% overlap, so no data added)
Windows overlap 50% and use TDAC (Time-Domain Aliasing Cancellation) to kill block-edge clicks
Bit allocation is driven by a psychoacoustic model: bits go to audible parts, masked parts get few or zero bits
Masking: a loud tone hides quieter tones near it in frequency and just after it in time
Window switching: long blocks for steady tones (frequency detail), short blocks for transients (time detail, avoids pre-echo)
Typical compression: MP3 at 128 kbps = ~11:1 vs 1411 kbps CD; AAC ~256 kbps is near-transparent
Human hearing 20 Hz to 20 kHz; most coding bits target the most sensitive 2 to 5 kHz band
How it works
Slice the audio into short overlapping blocks (frames), e.g. ~1152 samples in MP3.
Apply a window to taper each block's edges so they fade in and out.
Run the MDCT to turn time-samples into frequency coefficients (the recipe of pitches).
Run a psychoacoustic model to find which frequencies are masked (inaudible).
Quantise: keep audible coefficients accurately, give masked ones few or zero bits.
Huffman-pack the kept coefficients into the bitstream; decoder runs inverse MDCT to rebuild the wave.
Real examples
MP3 podcast at 128 kbps streams a 1-hour show in ~57 MB instead of ~600 MB raw.
AAC is the codec inside YouTube, Apple Music and most show-playback apps on your phone.
A cymbal hit (transient) triggers short blocks so it stays crisp with no smeared pre-echo.
A held synth pad uses long blocks, capturing fine pitch detail in fewer bits.
Dolby AC-3 (cinema/streaming surround) and Opus both use the same MDCT transform-coding trick.
How it helps in live sound
For walk-in and interval music, run backing tracks at 256 kbps AAC or 320 kbps MP3, not 128 - cymbals and reverb tails survive on a big PA.
Carry true WAV/AIFF masters for anything mission-critical; transform-coded files lose data you can't get back.
Never re-encode an MP3 into another MP3 - quantisation errors stack and high frequencies crumble.
Low-bitrate files smear transients (pre-echo) - a kick or clap can sound soft or splatty through subs; bump the bitrate.
Watch the 2 to 5 kHz band: codecs protect it because ears are most sensitive there, so vocal clarity usually survives even cheap files.
When DJs hand you Spotify/streaming rips, expect a 320 kbps lossy ceiling (Spotify Ogg Vorbis, Apple Music ~256 kbps AAC) - fine for fills, not for a featured artist set.
Everyday analogy
It is like describing a chord by naming the notes in it instead of hand-drawing the whole vibrating string.
Watch out
Myth: a 320 kbps MP3 is identical to the CD. Reality: transform coding permanently discards masked detail; it only sounds transparent, the original samples are gone.
Fun fact
The MDCT outputs only N/2 coefficients from N samples yet loses nothing - the 50% block overlap cancels its own aliasing (TDAC), so the missing half is perfectly reconstructed on decode.
Key takeaways
Transform coding = swap raw wiggle for a frequency recipe per short block.
MP3/AAC use the MDCT with 50% overlapping windows and TDAC.
Sorting into frequencies lets a psychoacoustic model drop masked, inaudible parts.
Long blocks = pitch detail for steady tones; short blocks = timing for transients.
It is lossy: discarded detail never comes back, so keep WAV masters.