book

Hands-On Machine Learning with Scikit-Learn and TensorFlow

by Aurélien Géron

March 2017

Intermediate to advanced

572 pages

15h 56m

English

O'Reilly Media, Inc.

Read now

Unlock full access

The Machine Learning TsunamiMachine Learning in Your ProjectsObjective and ApproachPrerequisitesRoadmapOther ResourcesConventions Used in This BookUsing Code ExamplesO’Reilly SafariHow to Contact UsAcknowledgments
What Is Machine Learning?Why Use Machine Learning?Types of Machine Learning SystemsSupervised/Unsupervised LearningBatch and Online LearningInstance-Based Versus Model-Based LearningMain Challenges of Machine LearningInsufficient Quantity of Training DataNonrepresentative Training DataPoor-Quality DataIrrelevant FeaturesOverfitting the Training DataUnderfitting the Training DataStepping BackTesting and ValidatingExercises
Working with Real DataLook at the Big PictureFrame the ProblemSelect a Performance MeasureCheck the AssumptionsGet the DataCreate the WorkspaceDownload the DataTake a Quick Look at the Data StructureCreate a Test SetDiscover and Visualize the Data to Gain InsightsVisualizing Geographical DataLooking for CorrelationsExperimenting with Attribute CombinationsPrepare the Data for Machine Learning AlgorithmsData CleaningHandling Text and Categorical AttributesCustom TransformersFeature ScalingTransformation PipelinesSelect and Train a ModelTraining and Evaluating on the Training SetBetter Evaluation Using Cross-ValidationFine-Tune Your ModelGrid SearchRandomized SearchEnsemble MethodsAnalyze the Best Models and Their ErrorsEvaluate Your System on the Test SetLaunch, Monitor, and Maintain Your SystemTry It Out!Exercises
MNISTTraining a Binary ClassifierPerformance MeasuresMeasuring Accuracy Using Cross-ValidationConfusion MatrixPrecision and RecallPrecision/Recall TradeoffThe ROC CurveMulticlass ClassificationError AnalysisMultilabel ClassificationMultioutput ClassificationExercises
Linear RegressionThe Normal EquationComputational ComplexityGradient DescentBatch Gradient DescentStochastic Gradient DescentMini-batch Gradient DescentPolynomial RegressionLearning CurvesRegularized Linear ModelsRidge RegressionLasso RegressionElastic NetEarly StoppingLogistic RegressionEstimating ProbabilitiesTraining and Cost FunctionDecision BoundariesSoftmax RegressionExercises
Linear SVM ClassificationSoft Margin ClassificationNonlinear SVM ClassificationPolynomial KernelAdding Similarity FeaturesGaussian RBF KernelComputational ComplexitySVM RegressionUnder the HoodDecision Function and PredictionsTraining ObjectiveQuadratic ProgrammingThe Dual ProblemKernelized SVMOnline SVMsExercises
Training and Visualizing a Decision TreeMaking PredictionsEstimating Class ProbabilitiesThe CART Training AlgorithmComputational ComplexityGini Impurity or Entropy?Regularization HyperparametersRegressionInstabilityExercises
Voting ClassifiersBagging and PastingBagging and Pasting in Scikit-LearnOut-of-Bag EvaluationRandom Patches and Random SubspacesRandom ForestsExtra-TreesFeature ImportanceBoostingAdaBoostGradient BoostingStackingExercises
The Curse of DimensionalityMain Approaches for Dimensionality ReductionProjectionManifold LearningPCAPreserving the VariancePrincipal ComponentsProjecting Down to d DimensionsUsing Scikit-LearnExplained Variance RatioChoosing the Right Number of DimensionsPCA for CompressionRandomized PCAIncremental PCAKernel PCASelecting a Kernel and Tuning HyperparametersLLEOther Dimensionality Reduction TechniquesExercises

InstallationCreating Your First Graph and Running It in a SessionManaging GraphsLifecycle of a Node ValueLinear Regression with TensorFlowImplementing Gradient DescentManually Computing the GradientsUsing autodiffUsing an OptimizerFeeding Data to the Training AlgorithmSaving and Restoring ModelsVisualizing the Graph and Training Curves Using TensorBoardName ScopesModularitySharing VariablesExercises
From Biological to Artificial NeuronsBiological NeuronsLogical Computations with NeuronsThe PerceptronMulti-Layer Perceptron and BackpropagationTraining an MLP with TensorFlow’s High-Level APITraining a DNN Using Plain TensorFlowConstruction PhaseExecution PhaseUsing the Neural NetworkFine-Tuning Neural Network HyperparametersNumber of Hidden LayersNumber of Neurons per Hidden LayerActivation FunctionsExercises
Vanishing/Exploding Gradients ProblemsXavier and He InitializationNonsaturating Activation FunctionsBatch NormalizationGradient ClippingReusing Pretrained LayersReusing a TensorFlow ModelReusing Models from Other FrameworksFreezing the Lower LayersCaching the Frozen LayersTweaking, Dropping, or Replacing the Upper LayersModel ZoosUnsupervised PretrainingPretraining on an Auxiliary TaskFaster OptimizersMomentum OptimizationNesterov Accelerated GradientAdaGradRMSPropAdam OptimizationLearning Rate SchedulingAvoiding Overfitting Through RegularizationEarly Stoppingℓ1 and ℓ2 RegularizationDropoutMax-Norm RegularizationData AugmentationPractical GuidelinesExercises
Multiple Devices on a Single MachineInstallationManaging the GPU RAMPlacing Operations on DevicesParallel ExecutionControl DependenciesMultiple Devices Across Multiple ServersOpening a SessionThe Master and Worker ServicesPinning Operations Across TasksSharding Variables Across Multiple Parameter ServersSharing State Across Sessions Using Resource ContainersAsynchronous Communication Using TensorFlow QueuesLoading Data Directly from the GraphParallelizing Neural Networks on a TensorFlow ClusterOne Neural Network per DeviceIn-Graph Versus Between-Graph ReplicationModel ParallelismData ParallelismExercises
The Architecture of the Visual CortexConvolutional LayerFiltersStacking Multiple Feature MapsTensorFlow ImplementationMemory RequirementsPooling LayerCNN ArchitecturesLeNet-5AlexNetGoogLeNetResNetExercises
Recurrent NeuronsMemory CellsInput and Output SequencesBasic RNNs in TensorFlowStatic Unrolling Through TimeDynamic Unrolling Through TimeHandling Variable Length Input SequencesHandling Variable-Length Output SequencesTraining RNNsTraining a Sequence ClassifierTraining to Predict Time SeriesCreative RNNDeep RNNsDistributing a Deep RNN Across Multiple GPUsApplying DropoutThe Difficulty of Training over Many Time StepsLSTM CellPeephole ConnectionsGRU CellNatural Language ProcessingWord EmbeddingsAn Encoder–Decoder Network for Machine TranslationExercises
Efficient Data RepresentationsPerforming PCA with an Undercomplete Linear AutoencoderStacked AutoencodersTensorFlow ImplementationTying WeightsTraining One Autoencoder at a TimeVisualizing the ReconstructionsVisualizing FeaturesUnsupervised Pretraining Using Stacked AutoencodersDenoising AutoencodersTensorFlow ImplementationSparse AutoencodersTensorFlow ImplementationVariational AutoencodersGenerating DigitsOther AutoencodersExercises
Learning to Optimize RewardsPolicy SearchIntroduction to OpenAI GymNeural Network PoliciesEvaluating Actions: The Credit Assignment ProblemPolicy GradientsMarkov Decision ProcessesTemporal Difference Learning and Q-LearningExploration PoliciesApproximate Q-Learning and Deep Q-LearningLearning to Play Ms. Pac-Man Using the DQN AlgorithmExercisesThank You!
Chapter 1: The Machine Learning LandscapeChapter 2: End-to-End Machine Learning ProjectChapter 3: ClassificationChapter 4: Training ModelsChapter 5: Support Vector MachinesChapter 6: Decision TreesChapter 7: Ensemble Learning and Random ForestsChapter 8: Dimensionality ReductionChapter 9: Up and Running with TensorFlowChapter 10: Introduction to Artificial Neural NetworksChapter 11: Training Deep Neural NetsChapter 12: Distributing TensorFlow Across Devices and ServersChapter 13: Convolutional Neural NetworksChapter 14: Recurrent Neural NetworksChapter 15: AutoencodersChapter 16: Reinforcement Learning
Frame the Problem and Look at the Big PictureGet the DataExplore the DataPrepare the DataShort-List Promising ModelsFine-Tune the SystemPresent Your SolutionLaunch!
Manual DifferentiationSymbolic DifferentiationNumerical DifferentiationForward-Mode AutodiffReverse-Mode Autodiff
Hopfield NetworksBoltzmann MachinesRestricted Boltzmann MachinesDeep Belief NetsSelf-Organizing Maps

Content preview from Hands-On Machine Learning with Scikit-Learn and TensorFlow

Chapter 14. Recurrent Neural Networks

The batter hits the ball. You immediately start running, anticipating the ball’s trajectory. You track it and adapt your movements, and finally catch it (under a thunder of applause). Predicting the future is what you do all the time, whether you are finishing a friend’s sentence or anticipating the smell of coffee at breakfast. In this chapter, we are going to discuss recurrent neural networks (RNN), a class of nets that can predict the future (well, up to a point, of course). They can analyze time series data such as stock prices, and tell you when to buy or sell. In autonomous driving systems, they can anticipate car trajectories and help avoid accidents. More generally, they can work on sequences of arbitrary lengths, rather than on fixed-sized inputs like all the nets we have discussed so far. For example, they can take sentences, documents, or audio samples as input, making them extremely useful for natural language processing (NLP) systems such as automatic translation, speech-to-text, or sentiment analysis (e.g., reading movie reviews and extracting the rater’s feeling about the movie).

Moreover, RNNs’ ability to anticipate also makes them capable of surprising creativity. You can ask them to predict which are the most likely next notes in a melody, then randomly pick one of these notes and play it. Then ask the net for the next most likely notes, play it, and repeat the process again and again. Before you know it, your net will compose ...