O'Reilly logo

Hands-On Natural Language Processing with Python by Rajalingappaa Shanmugamani, Rajesh Arumugam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Audio signal representation

Let's now look at how to extract the frequency spectrum from the spoken digits dataset. This dataset contains the recording of the digits spoken in the form of a .wav file. We will utilize the librosa library which is commonly used for audio data analysis. First, we need to install the package using the following command:

pip install librosa

For other methods of installing this library, you can look at https://github.com/librosa/librosa. We will use the MFCC, or Mel frequency cepstral coefficient feature, of the audio signal. MFCC is a kind of power spectrum that is obtained from short time frames of the signal. The main assumption is that for short durations of the order of 20 ms to 40 ms, the frequency spectrum ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required