Biased data
The Word2Vec algorithm (discussed in, Chapter 10, Natural Language Processing) is a good example of how easily cultural stereotypes and prejudices leak into machine learning models. For instance, vectors trained on the Google news corpus tell us that:
USA - Pizza + Russia = Vodka
While this may sound very funny for some people, this sounds equally offensive for many more. Is the algorithm biased? No, it is all in the dataset.
Another example of badly biased data was a web service based on a neural network that assessed a face's beauty by the photo. Apparently, all of the training data contained white faces, so the model was giving the lowest scores to all non-white faces. I truly believe that the developers had no bad intentions ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access