How to do it...

The speech commands dataset contains ~65,000 WAV files. It contains sub-folders with the label names, where each file is a recording of one of 30 words, spoken by different speakers. In the Getting ready section of this recipe, we learned how to read a WAV file and obtain its frequency-amplitude representation by applying STFT. In this section, we'll extend the same idea in order to write a generator and then train a neural network to recognize the spoken word.

Let's begin by preparing a dataset for the generator:

  1. First, we list all the files inside the data_speech_commands_v0.01 folder and create a DataFrame:
files = list.files("data/data_speech_commands_v0.01",all.files = T,full.names = F,recursive = T)paste("Number audio ...

Get Deep Learning with R Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.