book

Fundamentals of Deep Learning, 2nd Edition

by Nithin Buduma, Nikhil Buduma, Joe Papa

May 2022

Intermediate to advanced

387 pages

11h 47m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Prerequisites and ObjectivesHow Is This Book Organized?Conventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgementsNithin and NikhilJoe
1. Fundamentals of Linear Algebra for Deep Learning
Data Structures and OperationsMatrix OperationsVector OperationsMatrix-Vector MultiplicationThe Fundamental SpacesThe Column SpaceThe Null SpaceEigenvectors and EigenvaluesSummary
2. Fundamentals of Probability
Events and ProbabilityConditional ProbabilityRandom VariablesExpectationVarianceBayes’ TheoremEntropy, Cross Entropy, and KL DivergenceContinuous Probability DistributionsSummary
3. The Neural Network
Building Intelligent MachinesThe Limits of Traditional Computer ProgramsThe Mechanics of Machine LearningThe NeuronExpressing Linear Perceptrons as NeuronsFeed-Forward Neural NetworksLinear Neurons and Their LimitationsSigmoid, Tanh, and ReLU NeuronsSoftmax Output LayersSummary
4. Training Feed-Forward Neural Networks
The Fast-Food ProblemGradient DescentThe Delta Rule and Learning RatesGradient Descent with Sigmoidal NeuronsThe Backpropagation AlgorithmStochastic and Minibatch Gradient DescentTest Sets, Validation Sets, and OverfittingPreventing Overfitting in Deep Neural NetworksSummary
5. Implementing Neural Networks in PyTorch
Introduction to PyTorchInstalling PyTorchPyTorch TensorsTensor InitTensor AttributesTensor OperationsGradients in PyTorchThe PyTorch nn ModulePyTorch Datasets and DataloadersBuilding the MNIST Classifier in PyTorchSummary
6. Beyond Gradient Descent
The Challenges with Gradient DescentLocal Minima in the Error Surfaces of Deep NetworksModel IdentifiabilityHow Pesky Are Spurious Local Minima in Deep Networks?Flat Regions in the Error SurfaceWhen the Gradient Points in the Wrong DirectionMomentum-Based OptimizationA Brief View of Second-Order MethodsLearning Rate AdaptationAdaGrad—Accumulating Historical GradientsRMSProp—Exponentially Weighted Moving Average of GradientsAdam—Combining Momentum and RMSPropThe Philosophy Behind Optimizer SelectionSummary
7. Convolutional Neural Networks
Neurons in Human VisionThe Shortcomings of Feature SelectionVanilla Deep Neural Networks Don’t ScaleFilters and Feature MapsFull Description of the Convolutional LayerMax PoolingFull Architectural Description of Convolution NetworksClosing the Loop on MNIST with Convolutional NetworksImage Preprocessing Pipelines Enable More Robust ModelsAccelerating Training with Batch NormalizationGroup Normalization for Memory Constrained Learning TasksBuilding a Convolutional Network for CIFAR-10Visualizing Learning in Convolutional NetworksResidual Learning and Skip Connections for Very Deep NetworksBuilding a Residual Network with Superhuman VisionLeveraging Convolutional Filters to Replicate Artistic StylesLearning Convolutional Filters for Other Problem DomainsSummary
8. Embedding and Representation Learning
Learning Lower-Dimensional RepresentationsPrincipal Component AnalysisMotivating the Autoencoder ArchitectureImplementing an Autoencoder in PyTorchDenoising to Force Robust RepresentationsSparsity in AutoencodersWhen Context Is More Informative than the Input VectorThe Word2Vec FrameworkImplementing the Skip-Gram ArchitectureSummary
9. Models for Sequence Analysis
Analyzing Variable-Length InputsTackling seq2seq with Neural N-GramsImplementing a Part-of-Speech TaggerDependency Parsing and SyntaxNetBeam Search and Global NormalizationA Case for Stateful Deep Learning ModelsRecurrent Neural NetworksThe Challenges with Vanishing GradientsLong Short-Term Memory UnitsPyTorch Primitives for RNN ModelsImplementing a Sentiment Analysis ModelSolving seq2seq Tasks with Recurrent Neural NetworksAugmenting Recurrent Networks with AttentionDissecting a Neural Translation NetworkSelf-Attention and TransformersSummary

10. Generative Models
Generative Adversarial NetworksVariational AutoencodersImplementing a VAEScore-Based Generative ModelsDenoising Autoencoders and Score MatchingSummary
11. Methods in Interpretability
OverviewDecision Trees and Tree-Based AlgorithmsLinear RegressionMethods for Evaluating Feature ImportancePermutation Feature ImportancePartial Dependence PlotsExtractive RationalizationLIMESHAPSummary
12. Memory Augmented Neural Networks
Neural Turing MachinesAttention-Based Memory AccessNTM Memory Addressing MechanismsDifferentiable Neural ComputersInterference-Free Writing in DNCsDNC Memory ReuseTemporal Linking of DNC WritesUnderstanding the DNC Read HeadThe DNC Controller NetworkVisualizing the DNC in ActionImplementing the DNC in PyTorchTeaching a DNC to Read and ComprehendSummary
13. Deep Reinforcement Learning
Deep Reinforcement Learning Masters Atari GamesWhat Is Reinforcement Learning?Markov Decision ProcessesPolicyFuture ReturnDiscounted Future ReturnExplore Versus Exploit ϵ -GreedyAnnealed ϵ -GreedyPolicy Versus Value LearningPole-Cart with Policy GradientsOpenAI GymCreating an AgentBuilding the Model and OptimizerSampling ActionsKeeping Track of HistoryPolicy Gradient Main FunctionPGAgent Performance on Pole-CartTrust-Region Policy OptimizationProximal Policy OptimizationQ-Learning and Deep Q-NetworksThe Bellman EquationIssues with Value IterationApproximating the Q-FunctionDeep Q-NetworkTraining DQNLearning StabilityTarget Q-NetworkExperience ReplayFrom Q-Function to PolicyDQN and the Markov AssumptionDQN’s Solution to the Markov AssumptionPlaying Breakout with DQNBuilding Our ArchitectureStacking FramesSetting Up Training OperationsUpdating Our Target Q-NetworkImplementing Experience ReplayDQN Main LoopDQNAgent Results on BreakoutImproving and Moving Beyond DQNDeep Recurrent Q-NetworksAsynchronous Advantage Actor-Critic AgentUNsupervised REinforcement and Auxiliary LearningSummary
Index
About the Authors

Content preview from Fundamentals of Deep Learning, 2nd Edition

Chapter 12. Memory Augmented Neural Networks

Mostafa Samir

So far we’ve seen how effective an RNN can be at solving a complex problem like machine translation. However, we’re still far from reaching its full potential! In Chapter 9 we mentioned that it’s theoretically proven that the RNN architecture is a universal functional representer; a more precise statement of the same result is that RNNs are Turing complete. This simply means that given proper wiring and adequate parameters, an RNN can learn to solve any computable problem, which is basically any problem that can be solved by a computer algorithm or, equivalently, a Turing machine.

Neural Turing Machines

Though theoretically possible, it’s extremely difficult to achieve that kind of universality in practice. This difficulty stems from the fact that we’re looking at an immensely huge search space of possible wirings and parameter values of RNNs, a space so vastly large for gradient descent to find an appropriate solution for any arbitrary problem. However, in this chapter we’ll start exploring some approaches at the edge of research that will allow us to start tapping into that potential.

Let’s think for a while about a very simple reading comprehension question like the following:

Mary travelled to the hallway. She grabbed the milk glass there.
Then she travelled to the office, where she found an apple
and grabbed it.

How many objects is Mary carrying?

The answer is so trivial: it’s two. But what actually happened in ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781492082170Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Fundamentals of Deep Learning, 2nd Edition

by Nithin Buduma, Nikhil Buduma, Joe Papa

Chapter 12. Memory Augmented Neural Networks

Neural Turing Machines

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.