Training the model

Now that we understand the data that we are using and the DeepSpeech model architecture, let's set up the environment to train the model. There are some preliminary steps to create a virtual environment for the project that are optional, but always recommended to use. Also, it's recommended to use GPUs to train these models.

Along with Python Version 3.5 and TensorFlow version 1.7+, the following are some of the prerequisites:

  • python-Levenshtein: To compute character error rate (CER), basically the distance
  • python_speech_features: To extract MFCC features from raw data
  • pysoundfile: To read FLAC files
  • scipy: Helper functions for windowing
  • tqdm: For displaying a progress bar

Let's create the virtual environment and install ...

Get Python Deep Learning Projects now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.