O'Reilly logo

Hands-On Natural Language Processing with Python by Rajalingappaa Shanmugamani, Rajesh Arumugam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Details of the architecture

The Adam optimizer (https://arxiv.org/abs/1412.6980)  is used with a particular learning rate schedule. Indeed, the learning rate is initialized at 0.001, and is then reduced to 0.0005, 0.0003, and 0.0001, after 500,000, 1 million, and 2 million global steps. A step is one gradient update. It shouldn't be confused with an epoch, which is a full cycle of gradient updates across the entire training dataset. 

L1 loss is used for both the encoder-decoder that predicts the mel-spectrograms and the postprocessing block that predicts the spectrograms. 

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required