book

Deep Learning

by Josh Patterson, Adam Gibson

August 2017

Intermediate to advanced

530 pages

13h 23m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
What’s in This Book?Who Is “The Practitioner”?Who Should Read This Book?The Enterprise Machine Learning PractitionerThe Enterprise ExecutiveThe AcademicConventions Used in This BookUsing Code ExamplesAdministrative NotesO’Reilly SafariHow to Contact UsAcknowledgmentsJoshAdam
1. A Review of Machine Learning
The Learning MachinesHow Can Machines Learn?Biological InspirationWhat Is Deep Learning?Going Down the Rabbit HoleFraming the QuestionsThe Math Behind Machine Learning: Linear AlgebraScalarsVectorsMatricesTensorsHyperplanesRelevant Mathematical OperationsConverting Data Into VectorsSolving Systems of EquationsThe Math Behind Machine Learning: StatisticsProbabilityConditional ProbabilitiesPosterior ProbabilityDistributionsSamples Versus PopulationResampling MethodsSelection BiasLikelihoodHow Does Machine Learning Work?RegressionClassificationClusteringUnderfitting and OverfittingOptimizationConvex OptimizationGradient DescentStochastic Gradient DescentQuasi-Newton Optimization MethodsGenerative Versus Discriminative ModelsLogistic RegressionThe Logistic FunctionUnderstanding Logistic Regression OutputEvaluating ModelsThe Confusion MatrixBuilding an Understanding of Machine Learning
2. Foundations of Neural Networks and Deep Learning
Neural NetworksThe Biological NeuronThe PerceptronMultilayer Feed-Forward NetworksTraining Neural NetworksBackpropagation LearningActivation FunctionsLinearSigmoidTanhHard TanhSoftmaxRectified LinearLoss FunctionsLoss Function NotationLoss Functions for RegressionLoss Functions for ClassificationLoss Functions for ReconstructionHyperparametersLearning RateRegularizationMomentumSparsity
3. Fundamentals of Deep Networks
Defining Deep LearningWhat Is Deep Learning?Organization of This ChapterCommon Architectural Principles of Deep NetworksParametersLayersActivation FunctionsLoss FunctionsOptimization AlgorithmsHyperparametersSummaryBuilding Blocks of Deep NetworksRBMsAutoencodersVariational Autoencoders
4. Major Architectures of Deep Networks
Unsupervised Pretrained NetworksDeep Belief NetworksGenerative Adversarial NetworksConvolutional Neural Networks (CNNs)Biological InspirationIntuitionCNN Architecture OverviewInput LayersConvolutional LayersPooling LayersFully Connected LayersOther Applications of CNNsCNNs of NoteSummaryRecurrent Neural NetworksModeling the Time Dimension3D Volumetric InputWhy Not Markov Models?General Recurrent Neural Network ArchitectureLSTM NetworksDomain-Specific Applications and Blended NetworksRecursive Neural NetworksNetwork ArchitectureVarieties of Recursive Neural NetworksApplications of Recursive Neural NetworksSummary and DiscussionWill Deep Learning Make Other Algorithms Obsolete?Different Problems Have Different Best MethodsWhen Do I Need Deep Learning?
5. Building Deep Networks
Matching Deep Networks to the Right ProblemColumnar Data and Multilayer PerceptronsImages and Convolutional Neural NetworksTime-series Sequences and Recurrent Neural NetworksUsing Hybrid NetworksThe DL4J Suite of ToolsVectorization and DataVecRuntimes and ND4JBasic Concepts of the DL4J APILoading and Saving ModelsGetting Input for the ModelSetting Up Model ArchitectureTraining and EvaluationModeling CSV Data with Multilayer Perceptron NetworksSetting Up Input DataDetermining Network ArchitectureTraining the ModelEvaluating the ModelModeling Handwritten Images Using CNNsJava Code Listing for the LeNet CNNLoading and Vectorizing the Input ImagesNetwork Architecture for LeNet in DL4JTraining the CNNModeling Sequence Data by Using Recurrent Neural NetworksGenerating Shakespeare via LSTMsClassifying Sensor Time-series Sequences Using LSTMsUsing Autoencoders for Anomaly DetectionJava Code Listing for Autoencoder ExampleSetting Up Input DataAutoencoder Network Architecture and TrainingEvaluating the ModelUsing Variational Autoencoders to Reconstruct MNIST DigitsCode Listing to Reconstruct MNIST DigitsExamining the VAE ModelApplications of Deep Learning in Natural Language ProcessingLearning Word Embedding Using Word2VecDistributed Representations of Sentences with Paragraph VectorsUsing Paragraph Vectors for Document Classification
6. Tuning Deep Networks
Basic Concepts in Tuning Deep NetworksAn Intuition for Building Deep NetworksBuilding the Intuition as a Step-by-Step ProcessMatching Input Data and Network ArchitecturesSummaryRelating Model Goal and Output LayersRegression Model Output LayerClassification Model Output LayerWorking with Layer Count, Parameter Count, and MemoryFeed-Forward Multilayer Neural NetworksControlling Layer and Parameter CountsEstimating Network Memory RequirementsWeight Initialization StrategiesUsing Activation FunctionsSummary Table for Activation FunctionsApplying Loss FunctionsUnderstanding Learning RatesUsing the Ratio of Updates-to-ParametersSpecific Recommendations for Learning RatesHow Sparsity Affects LearningApplying Methods of OptimizationSGD Best PracticesUsing Parallelization and GPUs for Faster TrainingOnline Learning and Parallel Iterative AlgorithmsParallelizing SGD in DL4JGPUsControlling Epochs and Mini-Batch SizeUnderstanding Mini-Batch Size Trade-OffsHow to Use RegularizationPriors as RegularizersMax-Norm RegularizationDropoutOther Regularization TopicsWorking with Class ImbalanceMethods for Sampling ClassesWeighted Loss FunctionsDealing with OverfittingUsing Network Statistics from the Tuning UIDetecting Poor Weight InitializationDetecting Nonshuffled DataDetecting Issues with Regularization
7. Tuning Specific Deep Network Architectures
Convolutional Neural Networks (CNNs)Common Convolutional Architectural PatternsConfiguring Convolutional LayersConfiguring Pooling LayersTransfer LearningRecurrent Neural NetworksNetwork Input Data and Input LayersOutput Layers and RnnOutputLayerTraining the NetworkDebugging Common Issues with LSTMsPadding and MaskingEvaluation and Scoring With MaskingVariants of Recurrent Network ArchitecturesRestricted Boltzmann MachinesHidden Units and Modeling Available InformationUsing Different UnitsUsing Regularization with RBMsDBNsUsing MomentumUsing RegularizationDetermining Hidden Unit Count
8. Vectorization
Introduction to Vectorization in Machine LearningWhy Do We Need to Vectorize Data?Strategies for Dealing with Columnar Raw Data AttributesFeature Engineering and Normalization TechniquesUsing DataVec for ETL and VectorizationVectorizing Image DataImage Data Representation in DL4JImage Data and Vector Normalization with DataVecWorking with Sequential Data in VectorizationMajor Variations of Sequential Data SourcesVectorizing Sequential Data with DataVecWorking with Text in VectorizationBag of WordsTF-IDFComparing Word2Vec and VSM ComparisonWorking with Graphs
9. Using Deep Learning and DL4J on Spark
Introduction to Using DL4J with Spark and HadoopOperating Spark from the Command LineConfiguring and Tuning Spark ExecutionRunning Spark on MesosRunning Spark on YARNGeneral Spark Tuning GuideTuning DL4J Jobs on SparkSetting Up a Maven Project Object Model for Spark and DL4JA pom.xml File Dependency TemplateSetting Up a POM File for CDH 5.XSetting Up a POM File for HDP 2.4Troubleshooting Spark and HadoopCommon Issues with ND4JDL4J Parallel Execution on SparkA Minimal Spark Training ExampleDL4J API Best Practices for SparkMultilayer Perceptron Spark ExampleSetting Up MLP Network Architecture for SparkDistributed Training and Model EvaluationBuilding and Executing a DL4J Spark JobGenerating Shakespeare Text with Spark and Long Short-Term MemorySetting Up the LSTM Network ArchitectureTraining, Tracking Progress, and Understanding ResultsModeling MNIST with a Convolutional Neural Network on SparkConfiguring the Spark Job and Loading MNIST DataSetting Up the LeNet CNN Architecture and Training

A. What Is Artificial Intelligence?
The Story So FarDefining Deep LearningDefining Artificial IntelligenceWhat Is Driving Interest Today in AI Today?Winter Is Coming
B. RL4J and Reinforcement Learning
PreliminariesMarkov Decision ProcessTerminologyDifferent SettingsModel-FreeObservation SettingSingle-Player and Adversarial GamesQ-LearningFrom Policy to Neural Networks the followingPolicy IterationExploration Versus ExploitationBellman EquationInitial State SamplingQ-Learning ImplementationModeling Q(s,a)Experience ReplayConvolutional Layers and Image PreprocessingHistory ProcessingDouble Q-LearningClippingScaling RewardsPrioritized ReplayGraph, Visualization, and Mean-QRL4JConclusion
C. Numbers Everyone Should Know
D. Neural Networks and Backpropagation: A Mathematical Approach
IntroductionBackpropagation in a Multilayer Perceptron
E. Using the ND4J API
Design and Basic UsageUnderstanding NDArraysND4J General SyntaxThe Basics of Working with NDArraysDatasetCreating Input VectorsBasics of Vector CreationUsing MLLibUtilConverting from INDArray to MLLib VectorConverting from MLLib Vector to INDArrayMaking Model Predictions with DL4JUsing the DL4J and ND4J Together
F. Using DataVec
Loading Data for Machine LearningLoading CSV Data for Multilayer PerceptronsLoading Image Data for Convolutional Neural NetworksLoading Sequence Data for Recurrent Neural NetworksTransforming Data: Data Wrangling with DataVecDataVec Transforms: Key ConceptsDataVec Transform Functionality: An Example
G. Working with DL4J from Source
Verifying Git Is InstalledCloning Key DL4J GitHub ProjectsDownloading Source via Zip FileUsing Maven to Build Source Code
H. Setting Up DL4J Projects
Creating a New DL4J ProjectJavaWorking with MavenIDEsSetting Up Other Maven POMsND4J and Maven
I. Setting Up GPUs for DL4J Projects
Switching Backends to GPUPicking a GPUTraining on a Multiple GPU SystemCUDA on Different PlatformsMonitoring GPU PerformanceNVIDIA System Management Interface
J. Troubleshooting DL4J Installations
Previous InstallationMemory Errors When Installing From SourceOlder Versions of MavenMaven and PATH VariablesBad JDK VersionsC++ and Other Development ToolsWindows and Include PathsMonitoring GPUsUsing the JVisualVMWorking with ClojureOS X and Float SupportFork-Join Bug in Java 7PrecautionsOther Local RepositoriesCheck Maven DependenciesReinstall DependenciesIf All Else FailsDifferent PlatformsOS XWindowsLinux
Index
About the Authors

Content preview from Deep Learning

Appendix F. Using DataVec

Alex Black

DataVec is a library for handling machine learning data. DataVec handles the Extract, Transform, and Load (ETL) or vectorization component of a machine learning pipeline. The goal of DataVec is to simplify the preparation and loading of raw data into a format ready for use for machine learning. DataVec includes functionality for loading tabular (comma-separated values [CSV] files, etc.), image, and time-series datasets, both for single machine and distributed (Apache Spark) applications.

ND4J Vector Creation and DataVec

DataVec is meant to handle many of the feature and label creation chores mentioned previously in this book. Using DataVec is considered a best practice for DL4J workflows on a single machine and on Spark.

DataVec provides two main categories of functionality:

Functionality for loading data, from a variety of formats
Functionality for performing common data transformation operations (often called data wrangling or data munging)

These two categories of functionality are discussed separately in the sections that follow.

Loading Data for Machine Learning

Machine learning data comes in a wide variety of formats, with different requirements and libraries for loading each. Too often, machine learning practitioners end up writing one-off code to load their data; this can be both time consuming and error prone. DataVec attempts to alleviate these issues in two ways: first, by providing data loading functionality for common use ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781491924570Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design