5. Information Theory (The Deep Root) · Concept 5 of 6
Rate Distortion Theory
It is the rulebook for the trade between making a file smaller and how much sound quality you are willing to lose doing it.
The R(D) curve sets the hard floor: spend bits where the ear hears them, and past the knee extra bitrate buys nothing.
What it is
The maths of the trade between file size (bitrate) and how much sound quality you accept losing.
Key facts
Coined by Claude Shannon (1948), formalised in his 1959 paper on coding with a fidelity criterion.
Rate R = bits per sample (or bits/sec). Distortion D = how far decoded sound sits from the original, usually mean squared error (MSE).
The R(D) function = the LOWEST bitrate that can hit a given distortion D. No codec can beat this curve, ever.
Gaussian formula: R(D) = 0.5 x log2(sigma^2 / D), where sigma^2 = signal power (variance) and D = allowed MSE. Halving D costs +0.5 bit/sample.
SNR rule: ~6.02 dB signal-to-noise per bit. 16-bit = ~96 dB dynamic range, 24-bit = ~144 dB.
CD = 44.1 kHz x 16 bit x 2 ch = 1411 kbps uncompressed PCM (the lossless reference).
MP3 128 kbps = ~11:1 compression vs CD; 320 kbps = ~4.4:1, so 320 keeps ~2.5x more bits per second.
Transparent zone (most ears can't ABX it): MP3 ~256-320 kbps, AAC ~256 kbps, Opus ~96-128 kbps.
Opus is the efficiency king: ~96-128 kbps Opus rivals ~256-320 kbps MP3 for the same perceived quality.
Lossy throws away sound BELOW the masking threshold (quiet bits hidden by loud ones); lossless (FLAC/ALAC) discards nothing, ~2:1 only.
How it works
Pick a target: file size or bitrate you can afford.
R(D) curve tells you the lowest bitrate that still hits the quality you want.
Encoder runs a psychoacoustic model to find what the ear can't hear.
It spends its bit budget on audible detail, dumps the masked stuff first.
Result: smallest file for that quality, or best quality for that size.
Push bitrate too low and distortion climbs fast past the curve's knee.
Real examples
Bouncing a podcast: 96 kbps Opus or 128 kbps MP3 is plenty for voice, tiny file.
Music master for streaming: bounce WAV/FLAC, let the platform transcode, never upload a 128 MP3.
Show playback playlist: 320 kbps MP3 or 256 kbps AAC sounds full on a PA, still fits a USB stick.
Email/text a rough mix: drop to 128 kbps so it actually sends, mark it 'reference only'.
Archive masters: FLAC (lossless ~2:1) keeps every bit for re-edits later.
How it helps in live sound
Walk-in / playback tracks for a gig: use 320 kbps MP3 or 256 kbps AAC minimum, never 128 on a big PA.
Run lossless WAV/FLAC off the playback laptop for the actual show, MP3 only as backup.
Streaming a live set: 128 kbps Opus is transparent and saves bandwidth on dodgy venue wifi.
A 128 kbps file's missing top-end shows up as harsh/brittle highs through tweeters and a smeared cymbal wash.
Client sends a 128 kbps 'reference track'? Get the WAV before you EQ to it, the lossy version lies up top.
Voice/comedy/spoken word: 96-128 kbps is fine; reserve the big bitrates for music.
Everyday analogy
Like choosing photo quality on your phone: each step smaller saves space but blurs detail, and the theory marks the smartest spot to draw that line.
Watch out
Myth: 'higher bitrate always sounds better.' Truth: past the transparency knee (~256-320 kbps MP3) extra bits add size but no audible gain, while the first bits matter most.
Fun fact
The 44.1 kHz CD sample rate was chosen so audio fit on existing Sony PCM video recorders, not for any acoustic reason.
Key takeaways
R(D) = the hard floor: lowest bitrate for a given quality, you can't cheat it.
Each extra bit/sample buys ~6 dB SNR; gains shrink as you climb.
Lossy codecs dump sound the ear can't hear (masking), not random data.
320 kbps sounds fuller than 128 kbps because it keeps ~2.5x more bits.
Diminishing returns: first bits = huge quality jump, last bits = waste.