The speech commands dataset contains ~65,000 WAV files. It contains sub-folders with the label names, where each file is a recording of one of 30 words, spoken by different speakers. In the Getting ready section of this recipe, we learned how to read a WAV file and obtain its frequency-amplitude representation by applying STFT. In this section, we'll extend the same idea in order to write a generator and then train a neural network to recognize the spoken word.
Let's begin by preparing a dataset for the generator:
- First, we list all the files inside the data_speech_commands_v0.01 folder and create a DataFrame:
files = list.files("data/data_speech_commands_v0.01",all.files = T,full.names = F,recursive = T)paste("Number audio ...