book

Probabilistic Deep Learning

by Elvis Murina, Oliver Duerr, Beate Sick

November 2020

Intermediate to advanced

296 pages

9h 8m

English

Manning Publications

Read now

Unlock full access

prefaceacknowledgmentsabout this bookWho should read this bookHow this book is organized: A roadmapAbout the codeliveBook discussion forumabout the authorsabout the cover illustration
1.1 A first look at probabilistic models1.2 A first brief look at deep learning (DL)1.2.1 A success story1.3 Classification1.3.1 Traditional approach to image classification1.3.2 Deep learning approach to image classification1.3.3 Non-probabilistic classification1.3.4 Probabilistic classification1.3.5 Bayesian probabilistic classification1.4 Curve fitting1.4.1 Non-probabilistic curve fitting1.4.2 Probabilistic curve fitting1.4.3 Bayesian probabilistic curve fitting1.5 When to use and when not to use DL?1.5.1 When not to use DL1.5.2 When to use DL1.5.3 When to use and when not to use probabilistic models?1.6 What you’ll learn in this bookSummary
2.1 Fully connected neural networks (fcNNs)2.1.1 The biology that inspired the design of artificial NNs2.1.2 Getting started with implementing an NN2.1.3 Using a fully connected NN (fcNN) to classify images2.2 Convolutional NNs for image-like data2.2.1 Main ideas in a CNN architecture2.2.2 A minimal CNN for edge lovers2.2.3 Biological inspiration for a CNN architecture2.2.4 Building and understanding a CNN2.3 One-dimensional CNNs for ordered data2.3.1 Format of time-ordered data2.3.2 What’s special about ordered data?2.3.3 Architectures for time-ordered dataSummary
3.1 “Hello world” in curve fitting3.1.1 Fitting a linear regression model based on a loss function3.2 Gradient descent method3.2.1 Loss with one free model parameter3.2.2 Loss with two free model parameters3.3 Special DL sauce3.3.1 Mini-batch gradient descent3.3.2 Using SGD variants to speed up the learning3.3.3 Automatic differentiation3.4 Backpropagation in DL frameworks3.4.1 Static graph frameworks3.4.2 Dynamic graph frameworksSummary

4.1 Introduction to the MaxLike principle: The mother of all loss functions4.2 Deriving a loss function for a classification problem4.2.1 Binary classification problem4.2.2 Classification problems with more than two classes4.2.3 Relationship between NLL, cross entropy, and Kullback-Leibler divergence4.3 Deriving a loss function for regression problems4.3.1 Using a NN without hidden layers and one output neuron for modeling a linear relationship between input and output4.3.2 Using a NN with hidden layers to model non-linear relationships between input and output4.3.3 Using an NN with additional output for regression tasks with nonconstant varianceSummary
5.1 Evaluating and comparing different probabilistic prediction models5.2 Introducing TensorFlow Probability (TFP)5.3 Modeling continuous data with TFP5.3.1 Fitting and evaluating a linear regression model with constant variance5.3.2 Fitting and evaluating a linear regression model with a nonconstant standard deviation5.4 Modeling count data with TensorFlow Probability5.4.1 The Poisson distribution for count data5.4.2 Extending the Poisson distribution to a zero-inflated Poisson (ZIP) distributionSummary
6.1 Flexible probability distributions in state-of-the-art DL models6.1.1 Multinomial distribution as a flexible distribution6.1.2 Making sense of discretized logistic mixture6.2 Case study: Bavarian roadkills6.3 Go with the flow: Introduction to normalizing flows (NFs)6.3.1 The principle idea of NFs6.3.2 The change of variable technique for probabilities6.3.3 Fitting an NF to data6.3.4 Going deeper by chaining flows6.3.5 Transformation between higher dimensional spaces*6.3.6 Using networks to control flows6.3.7 Fun with flows: Sampling facesSummary
7.1 What’s wrong with non-Bayesian DL: The elephant in the room7.2 The first encounter with a Bayesian approach7.2.1 Bayesian model: The hacker’s way7.2.2 What did we just do?7.3 The Bayesian approach for probabilistic models7.3.1 Training and prediction with a Bayesian model7.3.2 A coin toss as a Hello World example for Bayesian models7.3.3 Revisiting the Bayesian linear regression modelSummary
8.1 Bayesian neural networks (BNNs)8.2 Variational inference (VI) as an approximative Bayes approach8.2.1 Looking under the hood of VI*8.2.2 Applying VI to the toy problem*8.3 Variational inference with TensorFlow Probability8.4 MC dropout as an approximate Bayes approach8.4.1 Classical dropout used during training8.4.2 MC dropout used during train and test times8.5 Case studies8.5.1 Regression case study on extrapolation8.5.2 Classification case study with novel classesSummary