The roots of the current deep learning boom go surprisingly far back, to the 1950s. While vague ideas of “intelligent machines” can be found further back in fiction and speculation, the 1950s and ’60s saw the introduction of the first “artificial neural networks,” based on a dramatically simplified model of biological neurons. Amongst these models, the Perceptron system articulated by Frank Rosenblatt garnered particular interest (and hype). Connected to a simple “camera” circuit, it could learn to distinguish different types of objects. Although the first version ran as software on an IBM computer, subsequent versions were done in pure hardware.
Interest in the multilayer perceptron (MLP) model continued through the ’60s. This changed when, in 1969, Marvin Minksy and Seymour Papert published their book Perceptrons (MIT Press). The book contained a proof showing that linear perceptrons could not classify the behavior of a nonlinear function (XOR). Despite the limitations of the proof (nonlinear perceptron models existed at the time of the book’s publication, and are even noted by the authors), its publication heralded the plummeting of funding for neural network models. Research would not recover until the 1980s, with the rise of a new generation of researchers.
The increase in computing power together with the development of the back-propagation technique (known in various forms since the ’60s, but not applied in general until the ’80s) prompted a resurgence of interest in neural networks. Not only did computers have the power to train larger networks, but we also had the techniques to train deeper networks efficiently. The first convolutional neural networks combined these insights with a model of visual recognition from mammalian brains, yielding for the first time networks that could efficiently recognize complex images such as handwritten digits and faces. Convolutional networks do this by applying the same “subnetwork” to different locations of the image and aggregating the results of these into higher-level features. In Chapter 12 we look at how this works in more detail.
In the ’90s and early 2000s interest in neural networks declined again as more “understandable” models like support vector machines (SVMs) and decision trees became popular. SVMs proved to be excellent classifiers for many data sources of the time, especially when coupled with human-engineered features. In computer vision, “feature engineering” became popular. This involves building feature detectors for small elements in a picture and combining them by hand into something that recognizes more complex forms. It later turned out that deep learning nets learn to recognize very similar features and learn to combine them in a very similar way. In Chapter 12 we explore some of the inner workings of these networks and visualize what they learn.
With the advent of general-purpose programming on graphics processing units (GPUs) in the late 2000s, neural network architectures were able to make great strides over the competition. GPUs contain thousands of small processors that can do trillions of operations per second in parallel. Originally developed for computer gaming, where this is needed to render complex 3D scenes in real time, it turned out that the same hardware can be used to train neural networks in parallel, achieving speed improvements of a factor of 10 or higher.
The other thing that happened was that the internet made very large training sets available. Where researchers had been training classifiers with thousands of images before, now they had access to tens if not hundreds of millions of images. Combined with larger networks, neural networks had their chance to shine. This dominance has only continued in the succeeding years, with improved techniques and applications of neural networks to areas outside of image recognition, including translation, speech recognition, and image synthesis.
While the boom in computational power and better techniques led to an increase in interest in neural networks, we have also seen huge strides in usability. In particular, deep learning frameworks like TensorFlow, Theano, and Torch allow nonexperts to construct complex neural networks to solve their own machine learning problems. This has turned a task that used to require months or years of handcoding and head-on-table-banging effort (writing efficient GPU kernels is hard!) into something that anyone can do in an afternoon (or really a few days in practice). Increased usability has greatly increased the number of researchers who can work on deep learning problems. Frameworks like Keras with an even higher level of abstraction make it possible for anyone with a working knowledge of Python and some tools to run some interesting experiments, as this book will show.
A second important factor for “why now” is that large datasets have become available for everybody. Yes, Facebook and Google might still have the upper hand with access to billions of pictures, user comments, and what have you, but datasets with millions of items can be had from a variety of sources. In Chapter 1 we’ll look at a variety of options, and throughout the book the example code for each chapter will usually show in the first recipe how to get the needed training data.
At the same time, private companies have started to produce and collect orders of magnitude more data, which has made the whole area of deep learning suddenly commercially very interesting. A model that can tell the difference between a cat and a dog is all very well, but a model that increases sales by 15% by taking all historic sales data into account can be the difference between life and death for a company.
These days there is a wide choice of platforms, technologies, and programming languages for deep learning. In this book all the examples are in Python and most of the code relies on the excellent Keras framework. The example code is available on GitHub as a set of Python notebooks, one per chapter. So, having a working knowledge of the following will help:
Python 3 is preferred, but Python 2.7 should also work. We use a variety of helper libraries that all can easily be installed using pip. The code is generally straightforward so even a relative novice should be able to follow the action.
The heavy lifting for machine learning is done almost completely by Keras. Keras is an abstraction over either TensorFlow or Theano, both deep learning frameworks. Keras makes it easy to define neural networks in a very readable way. All code is tested against TensorFlow but should also work with Theano.
These useful and extensive libraries are casually used in many recipes. Most of the time it should be clear what is happening from the context, but a quick read-up on them won’t hurt.
Notebooks are a very nice way to share code; they allow for a mixture of code, output of code, and comments, all viewable in the browser.
Each chapter has a corresponding notebook that contains working code. The code in the book often leaves out details like imports, so it is a good idea to get the code from Git and launch a local notebook. First check out the code and enter the new directory:
git clone https://github.com/DOsinga/deep_learning_cookbook.git cd deep_learning_cookbook
Then set up a virtual environment for the project:
python3 -m venv venv3 source venv3/bin/activate
And install the dependencies:
pip install -r requirements.txt
pip uninstall tensorflow pip install tensorflow-gpu
You’ll also need to have a compatible GPU library setup, which can be a bit of a hassle.
Finally, bring up the IPython notebook server:
If everything worked, this should automatically open a web browser with an overview of the notebooks, one for each chapter. Feel free to play with the code; you can use Git to easily undo any changes you’ve made if you want to go back to the baseline:
git checkout <notebook_to_reset>.ipynb
The first section of every chapter lists the notebooks relevant for that chapter and the notebooks are numbered according to the chapters, so it should in general be easy to find your way around. In the notebook folder, you’ll also find three other directories:
Contains data needed by the various notebooks—mostly samples of open datasets or things that would be too cumbersome to generate yourself.
Used to store intermediate data.
Contains a subdirectory for each chapter that holds saved models for that chapter. If you don’t have the time to actually train the models, you can still run the models by loading them from here.
Chapter 1 provides in-depth information about how neural networks function, where to get data from, and how to preprocess that data to make it easier to consume. Chapter 2 is about getting stuck and what to do about it. Neural nets are notoriously hard to debug and the tips and tricks in this chapter on how to make them behave will come in handy when going through the more project-oriented recipes in the rest of the book. If you are impatient, you can skip this chapter and go back to it later when you do get stuck.
Chapters 3 through 15 are grouped around media, starting with text processing, followed by image processing, and finally music processing in Chapter 15. Each chapter describes one project split into various recipes. Typically a chapter will start with a data acquisition recipe, followed by a few recipes that build toward the goal of the chapter and a recipe on data visualization.
Chapter 16 is about using models in production. Running experiments in notebooks is great, but ultimately we want to share our results with actual users and get our models run on real servers or mobile devices. This chapter goes through the options.
The following typographical conventions are used in this book:
Indicates new terms, URLs, email addresses, filenames, and file extensions.
Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.
Constant width italic
Shows text that should be replaced with user-supplied values or by values determined by context.
This element signifies a tip or suggestion.
This element signifies a general note.
Each chapter in this book comes with one or more Python notebooks that contain the example code referred to in the chapters themselves. You can read the chapters without running the code, but it is more fun to work with the notebooks as you read. The code can be found at https://github.com/DOsinga/deep_learning_cookbook.
To get the example code for the recipes up and running, execute the following commands in a shell:
git clone https://github.com/DOsinga/deep_learning_cookbook.git cd deep_learning_cookbook python3 -m venv venv3 source venv3/bin/activate pip install -r requirements.txt jupyter notebook
This book is here to help you get your job done. All code in the accompanying notebooks is licensed under the permissive Apache License 2.0.
We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Deep Learning Cookbook by Douwe Osinga (O’Reilly). Copyright 2018 Douwe Osinga, 978-1-491-99584-6.”
Safari (formerly Safari Books Online) is a membership-based training and reference platform for enterprise, government, educators, and individuals.
Members have access to thousands of books, training videos, Learning Paths, interactive tutorials, and curated playlists from over 250 publishers, including O’Reilly Media, Harvard Business Review, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Adobe, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, and Course Technology, among others.
For more information, please visit http://oreilly.com/safari.
Please address comments and questions concerning this book to the publisher:
We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://bit.ly/deep-learning-cookbook.
To comment or ask technical questions about this book, send email to firstname.lastname@example.org.
For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia
From academics sharing new ideas by (pre)publishing papers on https://arxiv.org, to hackers coding up those ideas on GitHub to public and private institutions publishing datasets for anybody to use, the world of machine learning is full of people and organizations that welcome newcomers and make it as easy to get started as it is. Open data, open source, and open access publishing—this book wouldn’t be here without machine learning’s culture of sharing.
What is true for the ideas presented in this book is even more true for the code in this book. Writing a machine learning model from scratch is hard, so almost all the models in the notebooks are based on code from somewhere else. This is the best way to get things done—find a model that does something similar to what you want and change it step by step, verifying at each step that things still work.
A special thanks goes out to my friend and coauthor for this book, Russell Power. Apart from helping to write this Preface, Chapter 6, and Chapter 7, he has been instrumental in checking the technical soundness of the book and the accompanying code. Moreover, he’s been an invaluable asset as a sounding board for many ideas, some of which made it into the book.
Then there is my lovely wife, who was the first line of defense when it came to proofreading chapters as they came into being. She has an uncanny ability to spot mistakes in a text that is neither in her native language nor about a subject she’s previously been an expert on.
The requirements.in file lists the open source packages that are used in this book. A heartfelt thank you goes out to all the contributors to all of these projects. This goes doubly for Keras, since almost all the code is based on that framework and often borrows from its examples.
Example code and ideas from these packages and many blog posts contributed to this book. In particular:
This chapter takes ideas from Slav Ivanov’s blog post “37 Reasons Why Your Neural Network Is Not Working”.
Thanks to Google for publishing its Word2vec model.
Radim Řehůřek’s Gensim powers this chapter, and some of the code is based on examples from this great project.
This chapter draws heavily on the great blog post “The Unreasonable Effectiveness of Recurrent Neural Networks” by Andrej Karpathy. That blog post rekindled my interest in neural networks.
The visualization was inspired by Motoki Wu’s “Visualizations of Recurrent Neural Networks”.
This chapter was somewhat inspired by the Quora Question Pairs challenge on Kaggle.
The example code is copied from one of the Keras examples, but applied on a slightly different dataset.
This chapter is based on Yann Henon’s keras_frcnn.
Code and ideas are based on Nicholas Normandin’s Conditional Variational Autoencoder.
Autoencoder training code for Keras is based on Qin Yongliang’s DCGAN-Keras.
This was inspired by Heitor Guimarães’s gtzan.keras.