8. Psychoacoustics (Perception Layer) · Concept 17 of 18
Mel Scale
A way of measuring pitch based on how far apart pitches actually sound to people, not the raw numbers.
The green-to-purple curve bends a flat Hz ruler into perceived pitch: linear low down, squashed up high, anchored at 1000 Hz = 1000 mel.
What it is
A pitch ruler re-spaced so that equal steps sound equally far apart to human ears, not to a physics meter.
Key facts
Anchor: 1000 Hz is defined as exactly 1000 mels.
Standard formula (O'Shaughnessy): m = 2595 x log10(1 + f/700). m = mels, f = Hz, 700 = corner frequency in Hz, 2595 = scaling constant.
Reverse: f = 700 x (10^(m/2595) - 1), turning mels back into Hz.
Below ~1000 Hz the scale is roughly LINEAR with Hz; above ~1000 Hz it goes LOGARITHMIC (ever-bigger Hz jumps per pitch step).
Worked numbers: 100 Hz approximately 150 mel; 1000 Hz = 1000 mel; 2000 Hz approximately 1521 mel; 4000 Hz approximately 2146 mel; 8000 Hz approximately 2840 mel.
Doubling Hz does NOT double mels: 1000 to 2000 Hz adds only about 521 mels, not 1000.
Word 'mel' comes from 'melody'; introduced by Stevens, Volkmann and Newman in 1937.
Mel filterbanks (triangular filters evenly spaced in mels) are the front end of MFCCs, the standard speech-recognition features.
Typical mel filterbank: 26 to 40 triangular filters across 0 Hz to Nyquist (e.g. 8 kHz at 16 kHz sample rate).
Cousins: Bark scale (24 critical bands) and ERB scale do similar perceptual re-spacing; human hearing 20 Hz to 20 kHz.
How it works
Take a frequency in Hz (the physical pitch).
Feed it into m = 2595 x log10(1 + f/700) to get mels.
Low frequencies map almost 1:1; high frequencies get compressed.
Lay filters or analysis bands evenly along the MEL ruler.
Convert back with f = 700 x (10^(m/2595) - 1) when you need Hz again.
Result: the machine carves up sound the way ears judge pitch distance.
Real examples
MFCCs in Siri, Alexa and Zoom noise-reduction all start with a mel filterbank.
Music apps like Shazam and auto-tune lean on perceptual pitch spacing.
Spectrogram displays in iZotope RX offer a 'mel' or log view to match hearing.
Voice biometrics and speaker ID extract mel features.
ML audio models (speech-to-text, sound classifiers) train on mel spectrograms.
How it helps in live sound
EQ by ear with log-spaced bands: one 1/3-octave GEQ step feels like a constant mel jump, so sweep evenly across the band.
Notch low-mid mud (200-500 Hz) finely, but treat 2k-8k harshness in bigger Hz chunks since ears resolve highs coarsely.
On RTA / analyser apps, switch the X-axis to LOG or mel, not linear, so the picture matches what you hear.
High-shelf moves above 8 kHz need wide Hz spans to sound musical; small Hz nudges up there are nearly inaudible.
Feedback hunting: a 31-band (1/3-octave, log-spaced) GEQ kills ringing more naturally than a linear-Hz tool.
Everyday analogy
It is like a map where 1 cm near home equals a short walk but 1 cm in the far corners equals a whole day's drive, so spacing matches how far things actually feel.
Watch out
Myth: doubling the Hz doubles the mels. Wrong: 1000 to 2000 Hz adds only about 521 mels because the scale compresses up high.
Fun fact
A 1000 Hz tone is pinned to exactly 1000 mels by definition, so the whole human-pitch ruler is calibrated off one reference beep.
Key takeaways
Mel = pitch measured by perceived distance, not raw Hz.
Linear below ~1 kHz, logarithmic above ~1 kHz.
Anchor: 1000 Hz = 1000 mels exactly.
Formula: m = 2595 x log10(1 + f/700).
Powers MFCCs, the backbone of speech and audio ML.
Same idea behind log-frequency EQ and analyser views.