book

Deep Learning

Name: Deep Learning
Author: Andrew Glassner
ISBN: 9781718500723

by Andrew Glassner

June 2021

Intermediate to advanced

768 pages

32h 7m

English

No Starch Press

Read now

Unlock full access

Title Page
Copyright
Dedication
About the Author
Acknowledgments
Introduction
Who This Book Is ForThis Book Has No Complex Math and No CodeThere Is Code, If You Want ItThe Figures Are Available, Too!ErrataAbout This BookPart I: Foundational IdeasPart II: Basic Machine LearningPart III: Deep Learning BasicsPart IV: Beyond the BasicsFinal Words
Part I: Foundational Ideas
Chapter 1: An Overview of Machine Learning
Expert SystemsSupervised LearningUnsupervised LearningReinforcement Learning Deep LearningSummary
Chapter 2: Essential Statistics
Describing Randomness Random Variables and Probability Distributions Some Common DistributionsContinuous DistributionsDiscrete DistributionsCollections of Random ValuesExpected ValueDependenceIndependent and Identically Distributed VariablesSampling and ReplacementSelection with ReplacementSelection Without ReplacementBootstrappingCovariance and CorrelationCovarianceCorrelationStatistics Don’t Tell Us Everything High-Dimensional SpacesSummary
Chapter 3: Measuring Performance
Different Types of ProbabilityDart ThrowingSimple ProbabilityConditional ProbabilityJoint ProbabilityMarginal ProbabilityMeasuring CorrectnessClassifying SamplesThe Confusion MatrixCharacterizing Incorrect PredictionsMeasuring Correct and IncorrectAccuracyPrecisionRecallPrecision-Recall Tradeoff Misleading Measures f1 ScoreAbout These TermsOther MeasuresConstructing a Confusion Matrix CorrectlySummary

Chapter 4: Bayes’ Rule
Frequentist and Bayesian ProbabilityThe Frequentist ApproachThe Bayesian ApproachFrequentists vs. BayesiansFrequentist Coin FlippingBayesian Coin FlippingA Motivating ExamplePicturing the Coin ProbabilitiesExpressing Coin Flips as ProbabilitiesBayes’ RuleDiscussion of Bayes’ RuleBayes’ Rule and Confusion MatricesRepeating Bayes’ RuleThe Posterior-Prior LoopThe Bayes Loop in ActionMultiple HypothesesSummary
Chapter 5: Curves and Surfaces
The Nature of FunctionsThe DerivativeMaximums and MinimumsTangent LinesFinding Minimums and Maximums with DerivativesThe GradientWater, Gravity, and the GradientFinding Maximums and Minimums with GradientsSaddle PointsSummary
Chapter 6: Information Theory
Surprise and ContextUnderstanding SurpriseUnpacking ContextMeasuring InformationAdaptive CodesSpeaking Morse Customizing Morse CodeEntropyCross EntropyTwo Adaptive CodesUsing the CodesCross Entropy in PracticeKullback–Leibler DivergenceSummary
Part II: Basic Machine Learning
Chapter 7: Classification
Two-Dimensional Binary Classification2D Multiclass ClassificationMulticlass Classification One-Versus-RestOne-Versus-OneClusteringThe Curse of DimensionalityDimensionality and DensityHigh-Dimensional WeirdnessSummary
Chapter 8: Training and Testing
TrainingTesting the PerformanceTest DataValidation DataCross-Validationk-Fold Cross-ValidationSummary
Chapter 9: Overfitting and Underfitting
Finding a Good FitOverfittingUnderfittingDetecting and Addressing OverfittingEarly StoppingRegularizationBias and VarianceMatching the Underlying DataHigh Bias, Low VarianceLow Bias, High VarianceComparing CurvesFitting a Line with Bayes’ RuleSummary
Chapter 10: Data Preparation
Basic Data CleaningThe Importance of ConsistencyTypes of DataOne-Hot EncodingNormalizing and StandardizingNormalizationStandardizationRemembering the TransformationTypes of TransformationsSlice ProcessingSamplewise ProcessingFeaturewise ProcessingElementwise ProcessingInverse TransformationsInformation Leakage in Cross-ValidationShrinking the DatasetFeature SelectionDimensionality ReductionPrincipal Component AnalysisPCA for Simple ImagesPCA for Real ImagesSummary
Chapter 11: Classifiers
Types of Classifiersk-Nearest NeighborsDecision TreesUsing Decision TreesOverfitting TreesSplitting NodesSupport Vector MachinesThe Basic AlgorithmThe SVM Kernel Trick Naive BayesComparing ClassifiersSummary
Chapter 12: Ensembles
VotingEnsembles of Decision TreesBaggingRandom ForestsExtra TreesBoostingSummary
Part III: Deep Learning Basics
Chapter 13: Neural Networks
Real NeuronsArtificial NeuronsThe PerceptronModern Artificial NeuronsDrawing the NeuronsFeed-Forward NetworksNeural Network GraphsInitializing the WeightsDeep NetworksFully Connected LayersTensorsPreventing Network Collapse Activation FunctionsStraight-Line FunctionsStep FunctionsPiecewise Linear FunctionsSmooth FunctionsActivation Function GalleryComparing Activation FunctionsSoftmaxSummary
Chapter 14: Backpropagation
A High-Level Overview of Training Punishing ErrorA Slow Way to LearnGradient DescentGetting StartedBackprop on a Tiny Neural NetworkFinding Deltas for the Output NeuronsUsing Deltas to Change WeightsOther Neuron DeltasBackprop on a Larger NetworkThe Learning RateBuilding a Binary ClassifierPicking a Learning RateAn Even Smaller Learning RateSummary
Chapter 15: Optimizers
Error as a 2D CurveAdjusting the Learning RateConstant-Sized UpdatesChanging the Learning Rate over TimeDecay SchedulesUpdating StrategiesBatch Gradient DescentStochastic Gradient Descent Mini-Batch Gradient DescentGradient Descent VariationsMomentumNesterov MomentumAdagradAdadelta and RMSpropAdamChoosing an OptimizerRegularizationDropoutBatchnormSummary
PART IV: Beyond the Basics
Chapter 16: Convolutional Neural Networks
Introducing ConvolutionDetecting YellowWeight SharingLarger FiltersFilters and FeaturesPaddingMultidimensional ConvolutionMultiple FiltersConvolution Layers1D Convolution1×1 Convolutions Changing Output Size PoolingStridingTransposed ConvolutionHierarchies of FiltersSimplifying AssumptionsFinding Face MasksFinding Eyes, Noses, and MouthsApplying Our FiltersSummary
Chapter 17: Convnets in Practice
Categorizing Handwritten DigitsVGG16Visualizing Filters, Part 1Visualizing Filters, Part 2AdversariesSummary
Chapter 18: Autoencoders
Introduction to EncodingLossless and Lossy EncodingBlending RepresentationsThe Simplest AutoencoderA Better AutoencoderExploring the AutoencoderA Closer Look at the Latent VariablesThe Parameter SpaceBlending Latent VariablesPredicting from Novel InputConvolutional AutoencodersBlending Latent VariablesPredicting from Novel InputDenoisingVariational AutoencodersDistribution of Latent VariablesVariational Autoencoder StructureExploring the VAEWorking with the MNIST SamplesWorking with Two Latent VariablesProducing New InputSummary
Chapter 19: Recurrent Neural Networks
Working with LanguageCommon Natural Language Processing TasksTransforming Text into NumbersFine-Tuning and Downstream NetworksFully Connected PredictionTesting Our NetworkWhy Our Network FailedRecurrent Neural NetworksIntroducing StateRolling Up Our DiagramRecurrent Cells in ActionTraining a Recurrent Neural NetworkLong Short-Term Memory and Gated Recurrent NetworksUsing Recurrent Neural NetworksWorking with Sunspot DataGenerating TextDifferent ArchitecturesSeq2SeqSummary
Chapter 20: Attention and Transformers
EmbeddingEmbedding WordsELMoAttentionA Motivating AnalogySelf-AttentionQ/KV AttentionMulti-Head AttentionLayer IconsTransformersSkip ConnectionsNorm-AddPositional EncodingAssembling a TransformerTransformers in ActionBERT and GPT-2BERTGPT-2Generators DiscussionData PoisoningSummary
Chapter 21: Reinforcement Learning
Basic IdeasLearning a New GameThe Structure of Reinforcement LearningStep 1: The Agent Selects an ActionStep 2: The Environment RespondsStep 3: The Agent Updates ItselfBack to the Big PictureUnderstanding RewardsFlippersL-LearningThe BasicsThe L-Learning AlgorithmTesting Our AlgorithmHandling UnpredictabilityQ-LearningQ-Values and UpdatesQ-Learning PolicyPutting It All TogetherThe Elephant in the RoomQ-learning in ActionSARSAThe AlgorithmSARSA in ActionComparing Q-Learning and SARSAThe Big PictureSummary
Chapter 22: Generative Adversarial Networks
Forging MoneyLearning from ExperienceForging with Neural NetworksA Learning RoundWhy Adversarial?Implementing GANsThe DiscriminatorThe GeneratorTraining the GANGANs in ActionBuilding a Discriminator and Generator Training Our NetworkTesting Our NetworkDCGANsChallengesUsing Big SamplesModal CollapseTraining with Generated DataSummary
Chapter 23: Creative Applications
Deep DreamingStimulating FiltersRunning Deep DreamingNeural Style TransferRepresenting StyleRepresenting ContentStyle and Content TogetherRunning Style TransferGenerating More of This BookSummaryFinal Thoughts
References
Chapter 1Chapter 2Chapter 3Chapter 4Chapter 5Chapter 6Chapter 7Chapter 8Chapter 9Chapter 10Chapter 11Chapter 12Chapter 13Chapter 14Chapter 15Chapter 16Chapter 17Chapter 18Chapter 19Chapter 20Chapter 21Chapter 22Chapter 23
Image Credits
Chapter 1Chapter 10Chapter 16Chapter 17Chapter 18Chapter 23
Index
PART V: Bonus Chapters
Chapter B1: SciKit-Learn
Python Conventions and Libraries Estimators Creation Learning with fit() Predicting with predict() Using decision_function() and predict_proba() Clustering Transformations Inverse Transformations Data Refinement Ensembles Automation Cross-Validation Hyperparameter Searching Exhaustive Grid Search Random Grid Search Pipelines Looking at the Decision Boundary Applying Pipelined Transformations Datasets Utilities Wrapping Up References
Chapter B2: Keras Part 1
The Structure of This Chapter Libraries, Programming, and Debugging Versions and Programming Style Python Programming and Debugging Running Externally A Workaround Note Overview Tensors and Arrays Setting Up Keras Shapes of Tensors Holding Images GPUs and Other Accelerators Getting Started Hello, World Preparing the Data Reshaping Loading the Data Looking at the Data Train-test Splitting Fixing the Data Type Normalizing the Data Fixing the Labels Pre-Processing All in One Place Making the Model Turning Grids into Lists Creating the Model Compiling the Model Model Creation Summary Training the Model Training and Using Our Model Looking at the Output Prediction Analysis of Training History Saving and Loading Saving Everything in One File Saving Just the Weights Saving Just the Architecture Using Pre-Trained Models Saving the Pre-Processing Steps Callbacks Checkpoints Learning Rate Early Stopping Wrapping UpReferences Image Credits
Chapter B3: Keras Part 2
Improving the Model Counting Up Hyperparameters Changing One Hyperparameter Other Ways to Improve Adding Another Dense Layer Less Is More Adding Dropout Observations Using Scikit-Learn Keras Wrappers Cross-Validation Cross-Validation with Normalization Hyperparameter Searching Convolution Networks Utility Layers Preparing the Data for A CNN Convolution Layers Using Convolution for MNIST Patterns Image Data Augmentation Synthetic Data Parameter Searching for Convnets RNNs Generating Sequence Data RNN Data Preparation Building, Compiling, and Running the RNN Analyzing RNN Performance A More Complex Dataset Deep RNNS The Value of More Data Returning Sequences Stateful RNNs Time-Distributed Layers Generating Text The Functional API Input Layers Making A Functional Model SummaryReferences Image Credits

Content preview from Deep Learning

15 Optimizers

Training neural networks is frequently a time-consuming process. Anything that speeds it up is a welcome addition to our toolkit. This chapter is about a family of tools that are designed to speed up learning by improving the efficiency of gradient descent. The goals are to make gradient descent run faster and avoid some of the problems that can cause it to get stuck. These tools also automate some of the work of finding the best learning rate, including algorithms that can adjust that rate automatically over time. Collectively, these algorithms are called optimizers. Each optimizer has its strengths and weaknesses, so it’s ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098129019Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Deep Learning

by Andrew Glassner

15 Optimizers

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.