Real-time

Band energy

per-register
frequency-domainlow-latencypolyphonicper-band

An envelope detector traces the loudness contour of a waveform — the slow outline riding over the fast carrier inside it. Every graph on this page is drawn by the method's real algorithm, and the sliders at the top drive all of them at once.

The whole method, live

Band energy
per-registerpolyphonic
LowMidHigh
Smoothing8 samp (0.2 ms)

Score card

Causality
low-latency
Signal model
polyphonic
Reads
per-band
Latency
≈1 frame
Cost
STFT
Domain
frequency

Scored qualitatively.

This method outputs a normalized contour (onset strength, per-band or perceptual loudness), not an amplitude in the units of the true envelope — so an amplitude error number would be meaningless. Its strength is the spectral axis: read the gallery below.

How it works

One contour isn't enough — track loudness per register. Split the spectrum into a few bands and follow each band's energy over time. Because a bassline and a cymbal occupy different bands, polyphony stops averaging into mush: here the low band rides the sustained notes while the high band spikes on the percussive hits.

Each band is normalized to its own peak so its shape is readable. In practice these are mel or critical bands; this is also exactly what an STFT/spectrogram gives you, one envelope per bin.

Key terms

Frequency band
A slice of the spectrum — say a low / mid / high split — followed independently over time. Each band carries its own envelope, so a bassline and a cymbal never average into the same contour.
Mel / critical bands
Perceptually-spaced bands that match how the ear groups frequency: narrow down low, wide up high. They are the usual choice in practice, since a band split that tracks hearing reads more like what you actually notice in the mix.
Per-band normalization
Each band scaled to its own peak so its shape stays readable regardless of absolute energy. A quiet high band and a loud low band both fill the same vertical range, so you compare their motion, not their level.

Building the envelope, step by step

One envelope can't describe a mix where a bassline and a cymbal sound at once. The fix is to stop asking for a single contour and follow energy per register instead — each graph below is drawn by the real algorithm on a polyphonic mix.

  1. Step 1The raw mix

    Start with the polyphonic input — several voices at once, with no single carrier to demodulate. A lone amplitude follower would just average them into mush.

  2. Step 2One contour per band

    Split the spectrum into a few bands and take each band's energy over time, normalized to its own peak. Now the low band rides the sustained notes while the high band spikes on the percussive hits — the polyphony is legible instead of blurred.

The code

Six readable forms of the exact algorithm that draws the curves above — C, JS and Python ports, an optimized C, a fixed-coefficient version, and a user-controlled one whose parameters match the sliders.

#include <math.h>

/* A magnitude STFT is assumed available:
     mag[k][m] = |X(bin k, frame m)|,  0 <= k < bins,  0 <= m < frames.
   (e.g. produced by some  stft(sig, mag, &frames, &bins);  helper.) */

/* Per-band RMS energy, each band normalized to its own peak.
   edges has nbands+1 entries: band b spans bins [edges[b], edges[b+1]).
   out[b] is a frames-long contour; caller allocates out[b]. */
void bands(const double *const *mag, int frames, int bins,
           const int *edges, int nbands, double **out) {
    for (int b = 0; b < nbands; b++) {
        double peak = 0.0;
        for (int m = 0; m < frames; m++) {
            double acc = 0.0;
            int c = 0;
            for (int k = edges[b]; k < edges[b + 1]; k++) {
                acc += mag[k][m] * mag[k][m];  /* sum squared magnitudes */
                c++;
            }
            double e = c > 0 ? sqrt(acc / c) : 0.0;  /* RMS over the band */
            out[b][m] = e;
            if (e > peak) peak = e;
        }
        /* normalize this band to its own peak so its shape is readable */
        double inv = 1.0 / (peak > 0.0 ? peak : 1e-9);
        for (int m = 0; m < frames; m++) out[b][m] *= inv;
    }
}