Deep Learning

Book description

Although interest in machine learning has reached a high point, lofty expectations often scuttle projects before they get very far. How can machine learning—especially deep neural networks—make a real difference in your organization? This hands-on guide not only provides the most practical information available on the subject, but also helps you get started building efficient deep learning networks.

Authors Adam Gibson and Josh Patterson provide theory on deep learning before introducing their open-source Deeplearning4j (DL4J) library for developing production-class workflows. Through real-world examples, you’ll learn methods and strategies for training deep network architectures and running deep learning workflows on Spark and Hadoop with DL4J.

  • Dive into machine learning concepts in general, as well as deep learning in particular
  • Understand how deep networks evolved from neural network fundamentals
  • Explore the major deep network architectures, including Convolutional and Recurrent
  • Learn how to map specific deep networks to the right problem
  • Walk through the fundamentals of tuning general neural networks and specific deep network architectures
  • Use vectorization techniques for different data types with DataVec, DL4J’s workflow tool
  • Learn how to use DL4J natively on Spark and Hadoop

Table of contents

  1. Preface
    1. What’s in This Book?
    2. Who Is “The Practitioner”?
    3. Who Should Read This Book?
      1. The Enterprise Machine Learning Practitioner
      2. The Enterprise Executive
      3. The Academic
    4. Conventions Used in This Book
    5. Using Code Examples
    6. Administrative Notes
    7. O’Reilly Safari
    8. How to Contact Us
    9. Acknowledgments
      1. Josh
      2. Adam
  2. 1. A Review of Machine Learning
    1. The Learning Machines
      1. How Can Machines Learn?
      2. Biological Inspiration
      3. What Is Deep Learning?
      4. Going Down the Rabbit Hole
    2. Framing the Questions
    3. The Math Behind Machine Learning: Linear Algebra
      1. Scalars
      2. Vectors
      3. Matrices
      4. Tensors
      5. Hyperplanes
      6. Relevant Mathematical Operations
      7. Converting Data Into Vectors
      8. Solving Systems of Equations
    4. The Math Behind Machine Learning: Statistics
      1. Probability
      2. Conditional Probabilities
      3. Posterior Probability
      4. Distributions
      5. Samples Versus Population
      6. Resampling Methods
      7. Selection Bias
      8. Likelihood
    5. How Does Machine Learning Work?
      1. Regression
      2. Classification
      3. Clustering
      4. Underfitting and Overfitting
      5. Optimization
      6. Convex Optimization
      7. Gradient Descent
      8. Stochastic Gradient Descent
      9. Quasi-Newton Optimization Methods
      10. Generative Versus Discriminative Models
    6. Logistic Regression
      1. The Logistic Function
      2. Understanding Logistic Regression Output
    7. Evaluating Models
      1. The Confusion Matrix
    8. Building an Understanding of Machine Learning
  3. 2. Foundations of Neural Networks and Deep Learning
    1. Neural Networks
      1. The Biological Neuron
      2. The Perceptron
      3. Multilayer Feed-Forward Networks
    2. Training Neural Networks
      1. Backpropagation Learning
    3. Activation Functions
      1. Linear
      2. Sigmoid
      3. Tanh
      4. Hard Tanh
      5. Softmax
      6. Rectified Linear
    4. Loss Functions
      1. Loss Function Notation
      2. Loss Functions for Regression
      3. Loss Functions for Classification
      4. Loss Functions for Reconstruction
    5. Hyperparameters
      1. Learning Rate
      2. Regularization
      3. Momentum
      4. Sparsity
  4. 3. Fundamentals of Deep Networks
    1. Defining Deep Learning
      1. What Is Deep Learning?
      2. Organization of This Chapter
    2. Common Architectural Principles of Deep Networks
      1. Parameters
      2. Layers
      3. Activation Functions
      4. Loss Functions
      5. Optimization Algorithms
      6. Hyperparameters
      7. Summary
    3. Building Blocks of Deep Networks
      1. RBMs
      2. Autoencoders
      3. Variational Autoencoders
  5. 4. Major Architectures of Deep Networks
    1. Unsupervised Pretrained Networks
      1. Deep Belief Networks
      2. Generative Adversarial Networks
    2. Convolutional Neural Networks (CNNs)
      1. Biological Inspiration
      2. Intuition
      3. CNN Architecture Overview
      4. Input Layers
      5. Convolutional Layers
      6. Pooling Layers
      7. Fully Connected Layers
      8. Other Applications of CNNs
      9. CNNs of Note
      10. Summary
    3. Recurrent Neural Networks
      1. Modeling the Time Dimension
      2. 3D Volumetric Input
      3. Why Not Markov Models?
      4. General Recurrent Neural Network Architecture
      5. LSTM Networks
      6. Domain-Specific Applications and Blended Networks
    4. Recursive Neural Networks
      1. Network Architecture
      2. Varieties of Recursive Neural Networks
      3. Applications of Recursive Neural Networks
    5. Summary and Discussion
      1. Will Deep Learning Make Other Algorithms Obsolete?
      2. Different Problems Have Different Best Methods
      3. When Do I Need Deep Learning?
  6. 5. Building Deep Networks
    1. Matching Deep Networks to the Right Problem
      1. Columnar Data and Multilayer Perceptrons
      2. Images and Convolutional Neural Networks
      3. Time-series Sequences and Recurrent Neural Networks
      4. Using Hybrid Networks
    2. The DL4J Suite of Tools
      1. Vectorization and DataVec
      2. Runtimes and ND4J
    3. Basic Concepts of the DL4J API
      1. Loading and Saving Models
      2. Getting Input for the Model
      3. Setting Up Model Architecture
      4. Training and Evaluation
    4. Modeling CSV Data with Multilayer Perceptron Networks
      1. Setting Up Input Data
      2. Determining Network Architecture
      3. Training the Model
      4. Evaluating the Model
    5. Modeling Handwritten Images Using CNNs
      1. Java Code Listing for the LeNet CNN
      2. Loading and Vectorizing the Input Images
      3. Network Architecture for LeNet in DL4J
      4. Training the CNN
    6. Modeling Sequence Data by Using Recurrent Neural Networks
      1. Generating Shakespeare via LSTMs
      2. Classifying Sensor Time-series Sequences Using LSTMs
    7. Using Autoencoders for Anomaly Detection
      1. Java Code Listing for Autoencoder Example
      2. Setting Up Input Data
      3. Autoencoder Network Architecture and Training
      4. Evaluating the Model
    8. Using Variational Autoencoders to Reconstruct MNIST Digits
      1. Code Listing to Reconstruct MNIST Digits
      2. Examining the VAE Model
    9. Applications of Deep Learning in Natural Language Processing
      1. Learning Word Embedding Using Word2Vec
      2. Distributed Representations of Sentences with Paragraph Vectors
      3. Using Paragraph Vectors for Document Classification
  7. 6. Tuning Deep Networks
    1. Basic Concepts in Tuning Deep Networks
      1. An Intuition for Building Deep Networks
      2. Building the Intuition as a Step-by-Step Process
    2. Matching Input Data and Network Architectures
      1. Summary
    3. Relating Model Goal and Output Layers
      1. Regression Model Output Layer
      2. Classification Model Output Layer
    4. Working with Layer Count, Parameter Count, and Memory
      1. Feed-Forward Multilayer Neural Networks
      2. Controlling Layer and Parameter Counts
      3. Estimating Network Memory Requirements
    5. Weight Initialization Strategies
    6. Using Activation Functions
      1. Summary Table for Activation Functions
    7. Applying Loss Functions
    8. Understanding Learning Rates
      1. Using the Ratio of Updates-to-Parameters
      2. Specific Recommendations for Learning Rates
    9. How Sparsity Affects Learning
    10. Applying Methods of Optimization
      1. SGD Best Practices
    11. Using Parallelization and GPUs for Faster Training
      1. Online Learning and Parallel Iterative Algorithms
      2. Parallelizing SGD in DL4J
      3. GPUs
    12. Controlling Epochs and Mini-Batch Size
      1. Understanding Mini-Batch Size Trade-Offs
    13. How to Use Regularization
      1. Priors as Regularizers
      2. Max-Norm Regularization
      3. Dropout
      4. Other Regularization Topics
    14. Working with Class Imbalance
      1. Methods for Sampling Classes
      2. Weighted Loss Functions
    15. Dealing with Overfitting
    16. Using Network Statistics from the Tuning UI
      1. Detecting Poor Weight Initialization
      2. Detecting Nonshuffled Data
      3. Detecting Issues with Regularization
  8. 7. Tuning Specific Deep Network Architectures
    1. Convolutional Neural Networks (CNNs)
      1. Common Convolutional Architectural Patterns
      2. Configuring Convolutional Layers
      3. Configuring Pooling Layers
      4. Transfer Learning
    2. Recurrent Neural Networks
      1. Network Input Data and Input Layers
      2. Output Layers and RnnOutputLayer
      3. Training the Network
      4. Debugging Common Issues with LSTMs
      5. Padding and Masking
      6. Evaluation and Scoring With Masking
      7. Variants of Recurrent Network Architectures
    3. Restricted Boltzmann Machines
      1. Hidden Units and Modeling Available Information
      2. Using Different Units
      3. Using Regularization with RBMs
    4. DBNs
      1. Using Momentum
      2. Using Regularization
      3. Determining Hidden Unit Count
  9. 8. Vectorization
    1. Introduction to Vectorization in Machine Learning
      1. Why Do We Need to Vectorize Data?
      2. Strategies for Dealing with Columnar Raw Data Attributes
      3. Feature Engineering and Normalization Techniques
    2. Using DataVec for ETL and Vectorization
    3. Vectorizing Image Data
      1. Image Data Representation in DL4J
      2. Image Data and Vector Normalization with DataVec
    4. Working with Sequential Data in Vectorization
      1. Major Variations of Sequential Data Sources
      2. Vectorizing Sequential Data with DataVec
    5. Working with Text in Vectorization
      1. Bag of Words
      2. TF-IDF
      3. Comparing Word2Vec and VSM Comparison
    6. Working with Graphs
  10. 9. Using Deep Learning and DL4J on Spark
    1. Introduction to Using DL4J with Spark and Hadoop
      1. Operating Spark from the Command Line
    2. Configuring and Tuning Spark Execution
      1. Running Spark on Mesos
      2. Running Spark on YARN
      3. General Spark Tuning Guide
      4. Tuning DL4J Jobs on Spark
    3. Setting Up a Maven Project Object Model for Spark and DL4J
      1. A pom.xml File Dependency Template
      2. Setting Up a POM File for CDH 5.X
      3. Setting Up a POM File for HDP 2.4
    4. Troubleshooting Spark and Hadoop
      1. Common Issues with ND4J
    5. DL4J Parallel Execution on Spark
      1. A Minimal Spark Training Example
    6. DL4J API Best Practices for Spark
    7. Multilayer Perceptron Spark Example
      1. Setting Up MLP Network Architecture for Spark
      2. Distributed Training and Model Evaluation
      3. Building and Executing a DL4J Spark Job
    8. Generating Shakespeare Text with Spark and Long Short-Term Memory
      1. Setting Up the LSTM Network Architecture
      2. Training, Tracking Progress, and Understanding Results
    9. Modeling MNIST with a Convolutional Neural Network on Spark
      1. Configuring the Spark Job and Loading MNIST Data
      2. Setting Up the LeNet CNN Architecture and Training
  11. A. What Is Artificial Intelligence?
    1. The Story So Far
      1. Defining Deep Learning
      2. Defining Artificial Intelligence
    2. What Is Driving Interest Today in AI Today?
    3. Winter Is Coming
  12. B. RL4J and Reinforcement Learning
    1. Preliminaries
      1. Markov Decision Process
      2. Terminology
    2. Different Settings
      1. Model-Free
      2. Observation Setting
      3. Single-Player and Adversarial Games
    3. Q-Learning
      1. From Policy to Neural Networks the following
      2. Policy Iteration
      3. Exploration Versus Exploitation
      4. Bellman Equation
      5. Initial State Sampling
      6. Q-Learning Implementation
      7. Modeling Q(s,a)
      8. Experience Replay
      9. Convolutional Layers and Image Preprocessing
      10. History Processing
      11. Double Q-Learning
      12. Clipping
      13. Scaling Rewards
      14. Prioritized Replay
    4. Graph, Visualization, and Mean-Q
    5. RL4J
    6. Conclusion
  13. C. Numbers Everyone Should Know
  14. D. Neural Networks and Backpropagation: A Mathematical Approach
    1. Introduction
    2. Backpropagation in a Multilayer Perceptron
  15. E. Using the ND4J API
    1. Design and Basic Usage
      1. Understanding NDArrays
      2. ND4J General Syntax
      3. The Basics of Working with NDArrays
      4. Dataset
    2. Creating Input Vectors
      1. Basics of Vector Creation
    3. Using MLLibUtil
      1. Converting from INDArray to MLLib Vector
      2. Converting from MLLib Vector to INDArray
    4. Making Model Predictions with DL4J
      1. Using the DL4J and ND4J Together
  16. F. Using DataVec
    1. Loading Data for Machine Learning
    2. Loading CSV Data for Multilayer Perceptrons
    3. Loading Image Data for Convolutional Neural Networks
    4. Loading Sequence Data for Recurrent Neural Networks
    5. Transforming Data: Data Wrangling with DataVec
      1. DataVec Transforms: Key Concepts
      2. DataVec Transform Functionality: An Example
  17. G. Working with DL4J from Source
    1. Verifying Git Is Installed
    2. Cloning Key DL4J GitHub Projects
    3. Downloading Source via Zip File
    4. Using Maven to Build Source Code
  18. H. Setting Up DL4J Projects
    1. Creating a New DL4J Project
      1. Java
      2. Working with Maven
      3. IDEs
    2. Setting Up Other Maven POMs
      1. ND4J and Maven
  19. I. Setting Up GPUs for DL4J Projects
    1. Switching Backends to GPU
      1. Picking a GPU
      2. Training on a Multiple GPU System
    2. CUDA on Different Platforms
    3. Monitoring GPU Performance
      1. NVIDIA System Management Interface
  20. J. Troubleshooting DL4J Installations
    1. Previous Installation
    2. Memory Errors When Installing From Source
    3. Older Versions of Maven
    4. Maven and PATH Variables
    5. Bad JDK Versions
    6. C++ and Other Development Tools
    7. Windows and Include Paths
    8. Monitoring GPUs
    9. Using the JVisualVM
    10. Working with Clojure
    11. OS X and Float Support
    12. Fork-Join Bug in Java 7
    13. Precautions
      1. Other Local Repositories
      2. Check Maven Dependencies
      3. Reinstall Dependencies
      4. If All Else Fails
    14. Different Platforms
      1. OS X
      2. Windows
      3. Linux
  21. Index
  22. About the Authors

Product information

  • Title: Deep Learning
  • Author(s): Josh Patterson, Adam Gibson
  • Release date: August 2017
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491914250