An activation function maps the weighted input of a neuron into a real value to produce the neuron's output. Many of the NN's properties depend on the choice of activation function, including its ability to generalize, and the speed of the training process convergence. Usually, we want it to be differentiable, so we can optimize the whole network using the gradient descent. Most commonly used activation functions are non-linear: piecewise linear, or s-shaped (see Table 8.1). Nonlinear activation functions allow NNs to outperform other algorithms in many nontrivial tasks using only a few neurons. Oversimplifying, activation functions can be divided into two groups: step-like and rectifier-like (see Figure 8.3). Let's ...
Non-linearity function
Get Machine Learning with Swift now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.