In our latest release, we introduced spectrogram support for our Audio tag, so you can now label audio data for training or fine-tuning AI models using spectrograms. But what does this really mean, and how might it be helpful to you? Let’s break it down.
Previously, the Audio tag in Label Studio only displayed the waveform, showing volume changes over time. This was useful for spotting silence or segmenting audio, but it didn’t offer the detail needed for more complex tasks.
Unlike a waveform, a spectrogram shows frequency over time. It’s computed mathematically by taking windows of the waveform and doing a fourier transform on it to decompose the soundwave into its component parts. Spectrograms show us the frequency patterns in the sound over time, using color to convey the loudness information that we get from the waveform.
There are many possible use cases where a spectrogram would be more helpful than a waveform. One such example is speech recognition. Phoneticists, or Linguists who study sounds in language, have determined that spectrograms can help us visualize exactly what sounds were made by looking at formant values, or the clear bands or resonant frequencies that show up in the spectrograms of speech.
In the example above, we look at the words “bat” and “bag”, which are the same phonetically except for their final sound. The “g” sound in the word bag produces what we call a velar pinch, which is a visual cue in a spectrogram where the 1st and 2nd formant values “pinch” together in sounds that are made when the tongue hits your velum (the area in the middle of your mouth that you use to make a “k” or “g” sound). This information is not available in a waveform.
Other common use cases for spectrogram annotation include:
Ready to start annotating with spectrograms? It’s simple:
The spectrogram will appear just below the waveform, giving you both volume and frequency views—everything you need for more accurate, detailed annotations.
Ready to learn more about spectrograms? Keep an eye out for our next In the Loop, where we’ll go into even more detail. Happy labeling!