A few reminders on spectrograms and the mel scale

As we will see in the next sections, some efficient techniques used in state-of-the-art TTS systems (deep learning-based, or otherwise) rely on tricks that come from the signal processing world. For instance, generating a spectrogram instead of a waveform of a signal, and then applying a conversion algorithm, is often preferred over directly predicting a waveform. This can provide better results in a faster way. This section is a quick recap on spectrograms, and it will help you to understand many ideas that will be presented later in the chapter.

Essentially, a spectrogram is a way to represent the strength of an audio signal. It can be shown on a two-dimensional graph, where the x axis is ...

Get Hands-On Natural Language Processing with Python now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.