A few reminders on spectrograms and the mel scale

As we will see in the next sections, some efficient techniques used in state-of-the-art TTS systems (deep learning-based, or otherwise) rely on tricks that come from the signal processing world. For instance, generating a spectrogram instead of a waveform of a signal, and then applying a conversion algorithm, is often preferred over directly predicting a waveform. This can provide better results in a faster way. This section is a quick recap on spectrograms, and it will help you to understand many ideas that will be presented later in the chapter.

Essentially, a spectrogram is a way to represent the strength of an audio signal. It can be shown on a two-dimensional graph, where the x axis is ...

Get Hands-On Natural Language Processing with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.