Enchant.
Tools / Audio Concepts / 6. Audio Compression & Perceptual Coding
6. Audio Compression & Perceptual Coding · Concept 2 of 6

Transform Coding

A trick that rewrites the sound as a list of which pitches are present instead of a raw wiggle of the speaker.

Transform coding: raw wave becomes a frequency recipe, masked bars droppedAn audio block is transformed into frequency bars, then inaudible masked bars are dropped to compress.Transform Coding: wave to frequency recipe1. Raw wave (one block)amplitude over timeMDCT2. Frequency bars (the recipe)levellow pitch ---- frequency ---- high pitchmasking threshold3. Drop the masked bars (below the line) = compressionWhy it shrinks so hard- Keep loud, audible bars (blue)- Drop masked bars (red) ears miss- MP3 128k = ~11:1 vs raw CDN samples in to N/2 coefficients out - 50% overlap + TDAC = nothing lost in the transform itselfLong blocks (576 samp) = pitch detail | Short blocks (192 samp) = sharp transientsLossy: dropped bars never come back - keep WAV masters

Raw wave to frequency recipe via MDCT, then masked bars dropped - that sort-and-drop is how MP3/AAC squeeze audio.

What it is

Transform coding chops audio into short chunks and rewrites each chunk as a list of frequencies present, not raw samples.

Key facts

How it works

  1. Slice the audio into short overlapping blocks (frames), e.g. ~1152 samples in MP3.
  2. Apply a window to taper each block's edges so they fade in and out.
  3. Run the MDCT to turn time-samples into frequency coefficients (the recipe of pitches).
  4. Run a psychoacoustic model to find which frequencies are masked (inaudible).
  5. Quantise: keep audible coefficients accurately, give masked ones few or zero bits.
  6. Huffman-pack the kept coefficients into the bitstream; decoder runs inverse MDCT to rebuild the wave.

Real examples

How it helps in live sound

Everyday analogy

It is like describing a chord by naming the notes in it instead of hand-drawing the whole vibrating string.

Watch out

Myth: a 320 kbps MP3 is identical to the CD. Reality: transform coding permanently discards masked detail; it only sounds transparent, the original samples are gone.

Fun fact

The MDCT outputs only N/2 coefficients from N samples yet loses nothing - the 50% block overlap cancels its own aliasing (TDAC), so the missing half is perfectly reconstructed on decode.

Key takeaways

  • Transform coding = swap raw wiggle for a frequency recipe per short block.
  • MP3/AAC use the MDCT with 50% overlapping windows and TDAC.
  • Sorting into frequencies lets a psychoacoustic model drop masked, inaudible parts.
  • Long blocks = pitch detail for steady tones; short blocks = timing for transients.
  • It is lossy: discarded detail never comes back, so keep WAV masters.
← Previous
Perceptual Coding
☰ All 123 concepts

Need the gear and a crew who know this stuff?

Enchant Entertainment hires and operates sound, lighting and staging across Perth and regional WA.

Get a quoteAll concepts