The other chapters in this book are all about processing of images or texts. Those chapters represent the balance of media in deep learning research, but that is not to say that sound processing isn’t interesting and that we haven’t seen some great developments in this area in the last few years. Speech recognition and speech synthesis are what made home assistants like Amazon Alexa and Google Home a possibility. The old sitcom joke where the phone dials the wrong number hasn’t really been current since Siri came out.
It is easy to start experimenting with these systems; there are APIs out there that let you get a simple voice app up and running in a few hours. The voice processing, however, is done in Amazon, Google, or Apple’s data center, so we can’t really count these as deep learning experiments. Building state-of-the-art voice recognition systems is hard, although Mozilla’s Deep Speech is making some impressive progress.
This chapter focuses on music. We’ll start out with training a music classification model that can tell us what music we’re listening to. We’ll then use the results of this model to index local MP3s, making it possible to find songs similar in style. After that we’ll use the Spotify API to create a corpus of public playlists that we’ll use to train a music recommender.
The notebooks for this chapter are:
15.1 Song Classification 15.2 Index Local MP3s 15.3 Spotify Playlists 15.4 Train a Music Recommender