Mel Scale

A way of measuring pitch based on how far apart pitches actually sound to people, not the raw numbers.

The green-to-purple curve bends a flat Hz ruler into perceived pitch: linear low down, squashed up high, anchored at 1000 Hz = 1000 mel.

What it is

A pitch ruler re-spaced so that equal steps sound equally far apart to human ears, not to a physics meter.

Key facts

Anchor: 1000 Hz is defined as exactly 1000 mels.
Standard formula (O'Shaughnessy): m = 2595 x log10(1 + f/700). m = mels, f = Hz, 700 = corner frequency in Hz, 2595 = scaling constant.
Reverse: f = 700 x (10^(m/2595) - 1), turning mels back into Hz.
Below ~1000 Hz the scale is roughly LINEAR with Hz; above ~1000 Hz it goes LOGARITHMIC (ever-bigger Hz jumps per pitch step).
Worked numbers: 100 Hz approximately 150 mel; 1000 Hz = 1000 mel; 2000 Hz approximately 1521 mel; 4000 Hz approximately 2146 mel; 8000 Hz approximately 2840 mel.
Doubling Hz does NOT double mels: 1000 to 2000 Hz adds only about 521 mels, not 1000.
Word 'mel' comes from 'melody'; introduced by Stevens, Volkmann and Newman in 1937.
Mel filterbanks (triangular filters evenly spaced in mels) are the front end of MFCCs, the standard speech-recognition features.
Typical mel filterbank: 26 to 40 triangular filters across 0 Hz to Nyquist (e.g. 8 kHz at 16 kHz sample rate).
Cousins: Bark scale (24 critical bands) and ERB scale do similar perceptual re-spacing; human hearing 20 Hz to 20 kHz.

How it works

Take a frequency in Hz (the physical pitch).
Feed it into m = 2595 x log10(1 + f/700) to get mels.
Low frequencies map almost 1:1; high frequencies get compressed.
Lay filters or analysis bands evenly along the MEL ruler.
Convert back with f = 700 x (10^(m/2595) - 1) when you need Hz again.
Result: the machine carves up sound the way ears judge pitch distance.

Real examples

MFCCs in Siri, Alexa and Zoom noise-reduction all start with a mel filterbank.
Music apps like Shazam and auto-tune lean on perceptual pitch spacing.
Spectrogram displays in iZotope RX offer a 'mel' or log view to match hearing.
Voice biometrics and speaker ID extract mel features.
ML audio models (speech-to-text, sound classifiers) train on mel spectrograms.

How it helps in live sound

EQ by ear with log-spaced bands: one 1/3-octave GEQ step feels like a constant mel jump, so sweep evenly across the band.
Notch low-mid mud (200-500 Hz) finely, but treat 2k-8k harshness in bigger Hz chunks since ears resolve highs coarsely.
On RTA / analyser apps, switch the X-axis to LOG or mel, not linear, so the picture matches what you hear.
High-shelf moves above 8 kHz need wide Hz spans to sound musical; small Hz nudges up there are nearly inaudible.
Feedback hunting: a 31-band (1/3-octave, log-spaced) GEQ kills ringing more naturally than a linear-Hz tool.

Everyday analogy

It is like a map where 1 cm near home equals a short walk but 1 cm in the far corners equals a whole day's drive, so spacing matches how far things actually feel.