Chapter 1. The Truth About AI Bias

Cassie Kozyrkov

No technology is free of its creators. Despite our fondest sci-fi wishes, there’s no such thing as AI systems that are truly separate and autonomous...because they start with us. Though its effect can linger long after you’ve pressed a button, all technology is an echo of the wishes of whomever built it.

Data and Math Don’t Equal Objectivity

If you’re looking to AI as your savior from human foibles, tread carefully. Sure, data and math can increase the amount of information you use in decision making and/or save you from heat-of-the-moment silliness, but how you use them is still up to you.

Look, I know sci-fi sells. It’s much flashier to say “The AI learned to do this task all by itself” than to tell the truth: People used a tool with a cool name to help them write code. They fed in examples they considered appropriate, found some patterns in them, and turned those patterns into instructions. Then they checked whether they liked what those instructions did for them.

The truth drips with human subjectivity—look at all those little choices along the way that are left up to people running the project. What shall we apply AI to? Is it worth doing? In which circumstances? How shall we define success? How well does it need to work? The list goes on and on.

Tragicomically, adding data to the mix obscures the ever-present human element and creates an illusion of objectivity. Wrapping a glamorous coat of math around the core doesn’t make it any less squishy.

Technology always comes from and is designed by people, which means it’s no more objective than we are.

What Is Algorithmic Bias?

Algorithmic bias refers to situations in which a computer system reflects the implicit values of the people who created it. By this definition, even the most benign computer systems are biased; when we apply math toward a purpose, that purpose is shaped by the sensibilities of our times. Is AI exempt? Not at all. Stop thinking of AI as an entity and see it for what it really is: an excellent tool for writing code.

The whole point of AI is to let you explain your wishes to a computer using examples (data!) instead of instructions. Which examples? That depends on what you’re trying to teach your system to do. Think of your dataset as the textbook you’re asking your machine student to learn from.

Datasets Have Human Authors

When I’ve said that “AI bias doesn’t come from AI algorithms, it comes from people,” some folks have written to tell me that I’m wrong because bias comes from data. Well, we can both be winners...because people make the data. Like textbooks, datasets reflect the biases of their authors.

Consider the following image.


Was your first thought “bananas”? Why didn’t you mention the plastic bag roll, or the color of the bananas? This example comes from Google’s AI Fairness training course and demonstrates that although all three answers are technically correct, you have a bias to prefer one of them. Not all people would share that bias; what we perceive and how we respond is influenced by our norms. If you live on a planet where all bananas are blue, you might answer “yellow bananas” here. If you’ve never seen a banana before, you might say “shelves with yellow stuff on them.” Both answers are also correct.

The data you create for your system to learn from will be biased by how you see the world.

This Is No Excuse to Be a Jerk

Philosophical arguments invalidating the existence of truly unbiased and objective technology don’t give anyone an excuse to be a jerk. If anything, the fact that you can’t pass the ethical buck to a machine puts more responsibility on your shoulders, not less.

Sure, our perceptions are shaped by our times. Societal ideas of virtue, justice, kindness, fairness, and honor aren’t the same today as they were for people living a few thousand years ago, and they may keep evolving. That doesn’t make these ideas unimportant; it only means we can’t outsource them to a heap of wires. They’re the responsibility of all of us, together.

Fairness in AI

Once you appreciate that you are responsible for how you use your tools and where you point them, strive to make yourself aware of how your choices affect the rest of humanity. For example, deciding which application to pursue is a choice that affects other people. Think it through.

Another choice you have is which data to use for AI. You should expect better performance on examples that are similar to what your system learned from. If you choose not to use data from people like me, your system is more likely to make a mistake when I show up as your user. It’s your duty to think about the pain you could cause when that happens.

At a bare minimum, I hope you’d have the common sense to check whether the distribution of your user population matches the distribution in your data. For example, if 100% of your training examples come from residents of a single country, but your target users are global...expect a mess.

Fair and Aware

I’ve written a lot of words here, when I could have just told you that most of the research on the topic of bias and fairness in AI is about making sure that your system doesn’t have a disproportionate effect on some group of users relative to other groups. The primary focus of AI ethics is on distribution checks and similar analytics.

The reason I wrote so much is that I want you to do even better. Automated distribution checks go only so far. No one knows a system better than its creators, so if you’re building one, take the time to think about whom your actions will affect and how, and do your best to give those people a voice to guide you through your blind spots.

Get 97 Things About Ethics Everyone in Data Science Should Know now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.