O'Reilly logo

Hands-On Natural Language Processing with Python by Rajalingappaa Shanmugamani, Rajesh Arumugam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

The dataset

We will use the LJ Speech dataset for this task (https://keithito.com/LJ-Speech-Dataset/). It contains 13,100 .wav recordings with their corresponding transcripts. The transcripts are available in both their raw and normalized formats. In the normalized version of a transcript, numbers are written in full words.

The recordings were produced with the same voice. The total length of the audio content is roughly 24 hours, with samples that can last from 1 to 7 seconds. This dataset is in the public domain, and there are no restrictions on its use.

Note that the dataset will occupy roughly 3.8 GB on your hard disk after the extraction of the ZIP file, downloadable from the aforementioned link. 

The dataset folder contains a CSV file ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required