O'Reilly logo

Hands-On Natural Language Processing with Python by Rajalingappaa Shanmugamani, Rajesh Arumugam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data preparation

To be able to train Tacotron, we need to apply several preprocessing steps on this dataset. We have to prepare the normalized text data in metadata.csv, so that it has the proper shape to be used as the input of the encoder. Also, we should extract the mel and magnitude spectrograms that will be output by the decoder and the postprocessing CBHG module, respectively.

The data can be loaded with the  read_csv pandas. We need to take into account the fact that the CSV file does not contain any header, uses the pipe character to separate the columns, and contains quotation marks that are not always closed (the transcripts are not always full sentences):

metadata = pd.read_csv('data/LJSpeech-1.1/metadata.csv', dtype='object', ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required