Building an audio recognition model using siamese networks

In the last tutorial, we saw how to use siamese networks to recognize a face. Now we will see how to use siamese networks to recognize audio. We will train our network to differentiate between the sound of a dog and the sound of a cat. The dataset of cat and dog audio can be downloaded from here: https://www.kaggle.com/mmoreaux/audio-cats-and-dogs#cats_dogs.zip.

Once we have downloaded the data, we fragment our data into three folders: Dogs, Sub_dogs, and Cats. In Dogs and Sub_dogs, we place the dog's barking audio and in the Cats folder, we place the cat's audio. The objective of our network is to recognize whether the audio is a dog's barking or some different sound. As we know, ...

Get Hands-On Meta Learning with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.