February 2019
Beginner to intermediate
308 pages
7h 42m
English
As a general rule of thumb, ReLU is always used as the activation function for our intermediate hidden layers (that is, non-output layer). In 2011, it was proved by researchers that ReLU is superior to all previously used activation functions for training deep neural networks (DNNs). Today, ReLU is the most popular choice of activation function for DNNs, and it has become a default choice for activation functions.
Mathematically, we can represent ReLU as follows:

What the ReLU function does is to simply consider only the non-negative portion of the original , and to treat the negative portion as 0. The following graph illustrates this: ...