book

Fundamentals of Deep Learning

Name: Fundamentals of Deep Learning
Author: Nikhil Buduma
ISBN: 9781491925614

by Nikhil Buduma

June 2017

Intermediate to advanced

296 pages

8h 23m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Prerequisites and ObjectivesConventions Used in This BookUsing Code ExamplesSafari® Books OnlineHow to Contact UsAcknowledgements
1. The Neural Network
Building Intelligent MachinesThe Limits of Traditional Computer ProgramsThe Mechanics of Machine LearningThe NeuronExpressing Linear Perceptrons as NeuronsFeed-Forward Neural NetworksLinear Neurons and Their LimitationsSigmoid, Tanh, and ReLU NeuronsSoftmax Output LayersLooking Forward
2. Training Feed-Forward Neural Networks
The Fast-Food ProblemGradient DescentThe Delta Rule and Learning RatesGradient Descent with Sigmoidal NeuronsThe Backpropagation AlgorithmStochastic and Minibatch Gradient DescentTest Sets, Validation Sets, and OverfittingPreventing Overfitting in Deep Neural NetworksSummary
3. Implementing Neural Networks in TensorFlow
What Is TensorFlow?How Does TensorFlow Compare to Alternatives?Installing TensorFlowCreating and Manipulating TensorFlow VariablesTensorFlow OperationsPlaceholder TensorsSessions in TensorFlowNavigating Variable Scopes and Sharing VariablesManaging Models over the CPU and GPUSpecifying the Logistic Regression Model in TensorFlowLogging and Training the Logistic Regression ModelLeveraging TensorBoard to Visualize Computation Graphs and LearningBuilding a Multilayer Model for MNIST in TensorFlowSummary
4. Beyond Gradient Descent
The Challenges with Gradient DescentLocal Minima in the Error Surfaces of Deep NetworksModel IdentifiabilityHow Pesky Are Spurious Local Minima in Deep Networks?Flat Regions in the Error SurfaceWhen the Gradient Points in the Wrong DirectionMomentum-Based OptimizationA Brief View of Second-Order MethodsLearning Rate AdaptationAdaGrad—Accumulating Historical GradientsRMSProp—Exponentially Weighted Moving Average of GradientsAdam—Combining Momentum and RMSPropThe Philosophy Behind Optimizer SelectionSummary
5. Convolutional Neural Networks
Neurons in Human VisionThe Shortcomings of Feature SelectionVanilla Deep Neural Networks Don’t ScaleFilters and Feature MapsFull Description of the Convolutional LayerMax PoolingFull Architectural Description of Convolution NetworksClosing the Loop on MNIST with Convolutional NetworksImage Preprocessing Pipelines Enable More Robust ModelsAccelerating Training with Batch NormalizationBuilding a Convolutional Network for CIFAR-10Visualizing Learning in Convolutional NetworksLeveraging Convolutional Filters to Replicate Artistic StylesLearning Convolutional Filters for Other Problem DomainsSummary
6. Embedding and Representation Learning
Learning Lower-Dimensional RepresentationsPrincipal Component AnalysisMotivating the Autoencoder ArchitectureImplementing an Autoencoder in TensorFlowDenoising to Force Robust RepresentationsSparsity in AutoencodersWhen Context Is More Informative than the Input VectorThe Word2Vec FrameworkImplementing the Skip-Gram ArchitectureSummary
7. Models for Sequence Analysis
Analyzing Variable-Length InputsTackling seq2seq with Neural N-GramsImplementing a Part-of-Speech TaggerDependency Parsing and SyntaxNetBeam Search and Global NormalizationA Case for Stateful Deep Learning ModelsRecurrent Neural NetworksThe Challenges with Vanishing GradientsLong Short-Term Memory (LSTM) UnitsTensorFlow Primitives for RNN ModelsImplementing a Sentiment Analysis ModelSolving seq2seq Tasks with Recurrent Neural NetworksAugmenting Recurrent Networks with AttentionDissecting a Neural Translation NetworkSummary
8. Memory Augmented Neural Networks
Neural Turing MachinesAttention-Based Memory AccessNTM Memory Addressing MechanismsDifferentiable Neural ComputersInterference-Free Writing in DNCsDNC Memory ReuseTemporal Linking of DNC WritesUnderstanding the DNC Read HeadThe DNC Controller NetworkVisualizing the DNC in ActionImplementing the DNC in TensorFlowTeaching a DNC to Read and ComprehendSummary
9. Deep Reinforcement Learning
Deep Reinforcement Learning Masters Atari GamesWhat Is Reinforcement Learning?Markov Decision Processes (MDP)PolicyFuture ReturnDiscounted Future ReturnExplore Versus ExploitPolicy Versus Value LearningPolicy Learning via Policy GradientsPole-Cart with Policy GradientsOpenAI GymCreating an AgentBuilding the Model and OptimizerSampling ActionsKeeping Track of HistoryPolicy Gradient Main FunctionPGAgent Performance on Pole-CartQ-Learning and Deep Q-NetworksThe Bellman EquationIssues with Value IterationApproximating the Q-FunctionDeep Q-Network (DQN)Training DQNLearning StabilityTarget Q-NetworkExperience ReplayFrom Q-Function to PolicyDQN and the Markov AssumptionDQN’s Solution to the Markov AssumptionPlaying Breakout wth DQNBuilding Our ArchitectureStacking FramesSetting Up Training OperationsUpdating Our Target Q-NetworkImplementing Experience ReplayDQN Main LoopDQNAgent Results on BreakoutImproving and Moving Beyond DQNDeep Recurrent Q-Networks (DRQN)Asynchronous Advantage Actor-Critic Agent (A3C)UNsupervised REinforcement and Auxiliary Learning (UNREAL)Summary

Index

Content preview from Fundamentals of Deep Learning

Chapter 7. Models for Sequence Analysis

Surya Bhupatiraju

Analyzing Variable-Length Inputs

Up until now, we’ve only worked with data with fixed sizes: images from MNIST, CIFAR-10, and ImageNet. These models are incredibly powerful, but there are many situations in which fixed-length models are insufficient. The vast majority of interactions in our daily lives require a deep understanding of sequences—whether it’s reading the morning newspaper, making a bowl of cereal, listening to the radio, watching a presentation, or deciding to execute a trade on the stock market. To adapt to variable-length inputs, we’ll have to be a little bit more clever about how we approach designing deep learning models.

In Figure 7-1, we illustrate how our feed-forward neural networks break when analyzing sequences. If the sequence is the same size as the input layer, the model can perform as we expect it to. It’s even possible to deal with smaller inputs by padding zeros to the end of the input until it’s the appropriate length. However, the moment the input exceeds the size of the input layer, naively using the feedforward network no longer works.

Not all hope is lost, however. In the ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781491925607Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Fundamentals of Deep Learning

by Nikhil Buduma

Chapter 7. Models for Sequence Analysis

Analyzing Variable-Length Inputs

Figure 7-1. Feed-forward networks thrive on fixed input size problems. Zero padding can address the handling of smaller inputs, but when naively utilized, these models break when inputs exceed the fixed input size.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.