Recalling from the previous section describing artificial neural network nodes, if all the nodes work the same way or, at least produce the same result (0 or 1), how does the network differentiate between classes?

It uses assigned weights to each node's inputs. This is a feature (or variable in the data) that can have a large or small weighting, which then results in varying the contribution that the variable or feature makes to the sum in any node.

In practice, a variable can be assigned a large weight feeding into one node and an almost zero weight feeding into another node, which would mean that the variable would have a strong influence on the first and practically none on the second.

The sigmoid function (which we also mentioned ...

Get Statistics for Data Science now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.