How to Label Spectrograms for AI Models

Label Studio now supports spectrograms in addition to the waveform representations of audio data. But what does this really mean, and how might it be helpful to you? Let’s break it down.

Previously, the Audio tag in Label Studio only displayed the waveform, showing volume changes over time. This was useful for spotting silence or segmenting audio, but it didn’t offer the detail needed for more complex tasks.

Unlike a waveform, a spectrogram shows frequency over time. It’s computed mathematically by taking windows of the waveform and doing a fourier transform on it to decompose the soundwave into its component parts. Spectrograms show us the frequency patterns in the sound over time, using color to convey the loudness information that we get from the waveform.

There are many possible use cases where a spectrogram would be more helpful than a waveform. One such example is speech recognition.

Phoneticists, or Linguists who study sounds in language, have determined that spectrograms can help us visualize exactly what sounds were made by looking at formant values, or the clear bands or resonant frequencies that show up in the spectrograms of speech.

In the example above, we look at the words “bat” and “bag”, which are the same phonetically except for their final sound. The “g” sound in the word bag produces what we call a velar pinch, which is a visual cue in a spectrogram where the 1st and 2nd formant values “pinch” together in sounds that are made when the tongue hits your velum (the area in the middle of your mouth that you use to make a “k” or “g” sound). This information is not available in a waveform.

Other common use cases for spectrogram annotation include:

Bioacoustics, like animal sound classification, because animal calls have characteristic frequency contours that are clear visually.
Music information retrieval, because the distinct harmonic and rhythmic patterns of music appear visually in spectrograms,
Medical diagnostics, like cough classification or heartbeat anomaly detection, because frequency shifts and patterns can reveal pathology not evident from the waveform alone.
Machine fault detection (acoustic anomaly detection), monitors machinery via audio/vibration signals to detect faults which often manifest as specific frequency anomalies over time.

How to Use Spectrograms in Label Studio

Ready to start annotating with spectrograms? It’s simple:

Set up your project using the Audio tag, as you normally would.
Open any task, then click the gear icon below the waveform (left side).
In the Audio Settings panel:
- Adjust playback speed and zoom as needed
- Scroll to the Spectrogram Settings section
- Customize parameters like:
  - Number of Fourier transform samples (FFT)
  - Frequency scale (default is Mel, common for speech)
Finally, click “Show spectrogram.”

The spectrogram will appear just below the waveform, giving you both volume and frequency views—everything you need for more accurate, detailed annotations.

Open in Label Studio

Want to learn more about spectrograms? Keep an eye out for the next In the Loop video, part of our short-form educational series covering all things ML, AI and data science, where we’ll go into more detail. Happy labeling!

How to Label Spectrograms for AI Models

How to Use Spectrograms in Label Studio

Related Content