The strategy discussed above is coded as follows (the code file is available as Voice transcription.ipynb in GitHub):
- Download the dataset and import the relevant packages:
$wget http://www.openslr.org/resources/12/train-clean-100.tar.gz$tar xzvf train-clean-100.tar.gzimport librosaimport numpy as npimport pandas as pd
- Read all the file names and their corresponding transcriptions and turn them into separate lists:
import os, numpy as nporg_path = '/content/LibriSpeech/train-clean-100/'count = 0inp = k=0audio_name = audio_trans = for dir1 in os.listdir(org_path): dir2_path = org_path+dir1+'/' for dir2 in os.listdir(dir2_path): dir3_path = dir2_path+dir2+'/' for audio in os.listdir(dir3_path): if audio.endswith('.txt'): ...