A sigmoid can transform any input into a probability distribution. It basically squashes or maps any arbitrary range of values to a value between 0 and 1. Sigmoid functions are widely used in binary classification tasks with output that can be considered the probability of the class. The following diagram shows a graph of sigmoid activation:
As the preceding diagram shows, the function resembles the unit function, but smoother. This smoothness ensures differentiation in the entire range of the function, which is necessary during the training of the network, as will be discussed in a later section.