book

Deep Learning from Scratch

Name: Deep Learning from Scratch
Author: Seth Weidman
ISBN: 9781492041412

by Seth Weidman

September 2019

Intermediate to advanced

250 pages

6h 58m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Understanding Neural Networks Requires Multiple Mental ModelsChapter OutlinesConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
1. Foundations
FunctionsMathDiagramsCodeDerivativesMathDiagramsCodeNested FunctionsDiagramMathCodeAnother DiagramThe Chain RuleMathCodeA Slightly Longer ExampleMathDiagramCodeFunctions with Multiple InputsMathDiagramCodeDerivatives of Functions with Multiple InputsDiagramMathCodeFunctions with Multiple Vector InputsMathCreating New Features from Existing FeaturesMathDiagramCodeDerivatives of Functions with Multiple Vector InputsDiagramMathCodeVector Functions and Their Derivatives: One Step FurtherDiagramMathCodeVector Functions and Their Derivatives: The Backward PassComputational Graph with Two 2D Matrix InputsMathDiagramCodeThe Fun Part: The Backward PassDiagramMathCodeConclusion
2. Fundamentals
Supervised Learning OverviewSupervised Learning ModelsLinear RegressionLinear Regression: A DiagramLinear Regression: A More Helpful Diagram (and the Math)Adding in the InterceptLinear Regression: The CodeTraining the ModelCalculating the Gradients: A DiagramCalculating the Gradients: The Math (and Some Code)Calculating the Gradients: The (Full) CodeUsing These Gradients to Train the ModelAssessing Our Model: Training Set Versus Testing SetAssessing Our Model: The CodeAnalyzing the Most Important FeatureNeural Networks from ScratchStep 1: A Bunch of Linear RegressionsStep 2: A Nonlinear FunctionStep 3: Another Linear RegressionDiagramsCodeNeural Networks: The Backward PassTraining and Assessing Our First Neural NetworkTwo Reasons Why This Is HappeningConclusion
3. Deep Learning from Scratch
Deep Learning Definition: A First PassThe Building Blocks of Neural Networks: OperationsDiagramCodeThe Building Blocks of Neural Networks: LayersDiagramsBuilding Blocks on Building BlocksThe Layer BlueprintThe Dense LayerThe NeuralNetwork Class, and Maybe OthersDiagramCodeLoss ClassDeep Learning from ScratchImplementing Batch TrainingNeuralNetwork: CodeTrainer and OptimizerOptimizerTrainerPutting Everything TogetherOur First Deep Learning Model (from Scratch)Conclusion and Next Steps
4. Extensions
Some Intuition About Neural NetworksThe Softmax Cross Entropy Loss FunctionComponent #1: The Softmax FunctionComponent #2: The Cross Entropy LossA Note on Activation FunctionsExperimentsData PreprocessingModelExperiment: Softmax Cross Entropy LossMomentumIntuition for MomentumImplementing Momentum in the Optimizer ClassExperiment: Stochastic Gradient Descent with MomentumLearning Rate DecayTypes of Learning Rate DecayExperiments: Learning Rate DecayWeight InitializationMath and CodeExperiments: Weight InitializationDropoutDefinitionImplementationExperiments: DropoutConclusion
5. Convolutional Neural Networks
Neural Networks and Representation LearningA Different Architecture for Image DataThe Convolution OperationThe Multichannel Convolution OperationConvolutional LayersImplementation ImplicationsThe Differences Between Convolutional and Fully Connected LayersMaking Predictions with Convolutional Layers: The Flatten LayerPooling LayersImplementing the Multichannel Convolution OperationThe Forward PassConvolutions: The Backward PassBatches, 2D Convolutions, and Multiple Channels2D ConvolutionsThe Last Element: Adding “Channels”Using This Operation to Train a CNNThe Flatten OperationThe Full Conv2D LayerExperimentsConclusion
6. Recurrent Neural Networks
The Key Limitation: Handling BranchingAutomatic DifferentiationCoding Up Gradient AccumulationMotivation for Recurrent Neural NetworksIntroduction to Recurrent Neural NetworksThe First Class for RNNs: RNNLayerThe Second Class for RNNs: RNNNodePutting These Two Classes TogetherThe Backward PassRNNs: The CodeThe RNNLayer ClassThe Essential Elements of RNNNodes“Vanilla” RNNNodesLimitations of “Vanilla” RNNNodesOne Solution: GRUNodesLSTMNodesData Representation for a Character-Level RNN-Based Language ModelOther Language Modeling TasksCombining RNNLayer VariantsPutting This All TogetherConclusion
7. PyTorch
PyTorch TensorsDeep Learning with PyTorchPyTorch Elements: Model, Layer, Optimizer, and LossImplementing Neural Network Building Blocks Using PyTorch: DenseLayerExample: Boston Housing Prices Model in PyTorchPyTorch Elements: Optimizer and LossPyTorch Elements: TrainerTricks to Optimize Learning in PyTorchConvolutional Neural Networks in PyTorchDataLoader and TransformsLSTMs in PyTorchPostscript: Unsupervised Learning via AutoencodersRepresentation LearningAn Approach for Situations with No Labels WhatsoeverImplementing an Autoencoder in PyTorchA Stronger Test for Unsupervised Learning, and a SolutionConclusion
A. Deep Dives
Matrix Chain RuleGradient of the Loss with Respect to the Bias TermsConvolutions via Matrix Multiplication
Index

Content preview from Deep Learning from Scratch

Chapter 6. Recurrent Neural Networks

In this chapter, we’ll cover recurrent neural networks (RNNs), a class of neural network architectures meant for handling sequences of data. The neural networks we’ve seen so far treated each batch of data they received as a set of independent observations; there was no notion of some of the MNIST digits arriving before or after the other digits, in either the fully connected neural networks we saw in Chapter 4 or the convolutional neural networks we saw in Chapter 5. Many kinds of data, however, are intrinsically ordered, whether time series data, which one might deal with in an industrial or financial context, or language data, in which the characters, words, sentences, and so on are ordered. Recurrent neural networks are designed to learn how to take in sequences of such data and return a correct prediction as output, whether that correct prediction is of the price of a financial asset on the following day or of the next word in a sentence.

Dealing with ordered data will require three kinds of changes from the fully connected neural networks we saw in the first few chapters. First, it will involve “adding a new dimension” to the ndarrays we feed our neural networks. Previously, the data we fed our neural networks was intrinsically two-dimensional—each ndarray had one dimension representing the number of observations and another representing the number of features;¹ another way to think of this is that each observation was a one-dimensional ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781492041405Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Deep Learning from Scratch

by Seth Weidman

Chapter 6. Recurrent Neural Networks

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.