WaveNet, in brief
The WaveNet paper was presented in 2016, and it showed results that outperformed the classical TTS approaches. Basically, WaveNet is an audio generative model. It takes a sequence of audio samples as input and predicts the most likely following audio sample. By adding an extra input, it can be conditioned to accomplish more tasks. For instance, if the audio transcript is additionally provided during training, WaveNet can turn it into a TTS system.
WaveNet uses many interesting ideas to train very deep neural networks. The main concept involves dilated causal convolutions (check out the paper to learn more about them).
In the paper, TTS is tackled, among other tasks, and the model is not directly fed with raw text, but with ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access