Chapter 51. Algorithmic Misclassification—the (Pretty) Good, the Bad, and the Ugly

Arnobio Morelix

Every day, the systems we build classify the identity and behavior of people nonstop. A credit card transaction is labeled “fraudulent” or not. Political campaigns decide on “likely voters” for their candidate. People constantly claim and are judged on their identity of “not a robot” through captchas. Add to this the classification of emails, face recognition in phones, and targeted ads, and it is easy to imagine thousands of such classification instances per day for even just one person.

For the most part, these classifications are convenient and pretty good for the user and the organizations running them. We mostly forget them, unless they go obviously wrong.

I am a Latino living in the US, and I often get ads in Spanish—which would be pretty good targeting, except that I am a Brazilian Latino, and my native language is Portuguese, not Spanish.

This particular misclassification causes no real harm to me. My online behavior might look similar enough to that of a native Spanish speaker living in the US, and users like me getting mistargeted ads may be nothing more than a “rounding error” by the algorithm. Although it is in no one’s interest that I get these ads—I am wasting my time, and the company ...

Get 97 Things About Ethics Everyone in Data Science Should Know now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.