Power, Harms, and Data
Data is often biased. But that isn’t the real issue. Why is it biased? How do we build teams that are sensitive to that bias?
A recent article in The Verge discussed PULSE, an algorithm for “upsampling” digital images. PULSE, when applied to a low-resolution image of Barack Obama, recreated a White man’s face; applied to Alexandria Ocasio-Cortez, it built a White woman’s face. It had similar problems with other images of Black and Hispanic people, frequently giving them White skin and facial features.
PULSE could be used for applications like upsampling video for 8K ultra high-definition, but I’m less interested in the algorithm and its applications than in the discussion about ethics that it provoked. Is this just a problem with training data, as Yann LeCun said on Twitter? Or is it a sign of larger systemic issues about bias and power, as Timnit Gebru argued? The claim that this is only a problem with the data is tempting, but it is important to step back and see the bigger issues: nothing is “just” a problem with data. That shift to a wider perspective is badly needed.
There’s no question that the training data was a problem. If the algorithm were trained using a set of photos dominated by Black people, it would no doubt turn White faces into Black ones. With the right training set and training process, we could presumably minimize errors. When looked at this way, it’s largely a problem of mathematics and statistics. That’s the position that Timnit Gebru rejects, because it obscures the bigger issues hiding behind the training set. As organizations like Data For Black Lives, Black in AI, the Algorithmic Justice League, and others have been pointing out, it’s never just an issue of statistics. It’s an issue of harms and of power. Who stands to gain? Who stands to lose? That’s the point we really need to consider, particularly when we’re asking AI to create “information” where nothing existed before. Who controls the erasure, or the creation, of color? What are the assumptions that lie behind it?
I do not believe there are many AI researchers giggling about turning Black people into Whites (though there are no doubt some). Nor do I believe there’s some kind of racist demon lurking in the mathematics implemented by neural networks. But errors like this nevertheless happen; they happen all too frequently; the results are often harmful; and none of us are surprised that the transition was Black->White rather than the other way around. We were not surprised when we found that products like COMPAS recommended tougher criminal sentences for Black people than for Whites; nor were we surprised when Timnit Gebru and Joy Buolamwini showed that facial recognition is much less accurate for Black people than White people, and particularly inaccurate for Black women.
So, how do we think about the problem of power and race in AI? Timnit Gebru is right; saying that the problem is in the training data ignores the real problem. As does being saddened and making vague promises about doing better in the future. If we aren’t surprised, why? What do we have to learn, and how do we put that learning into practice?
We can start by considering what “biased training data” means. One of my favorite collections of essays about data is “Raw Data” is an Oxymoron. There is no such thing as “raw data,” and hence, no pure, unadulterated, unbiased data. Data is always historical and, as such, is the repository of historical bias. Data doesn’t just grow, like trees; data is collected, and the process of data collection often has its own agenda. Therefore, there are different ways of understanding data, different ways of telling stories about data–some of which account for its origin and relation to history, and some of which don’t.
Take, for example, housing data. That data will show that, in most places in the US, Black people live in separate neighborhoods from White people. But there are a number of stories we can tell about that data. Here are two very different stories:
- Segregated housing reflects how people want to live: Black people want to live near other Black people, and so on.
- Segregated housing reflects many years of policy aimed at excluding Black people from White neighborhoods: lending policy, educational policy, real estate policy.
There are many variations on those stories, but those two are enough. Neither is entirely wrong—though the first story erases an important fact, that White people have generally had the power to prevent Black people from moving into their neighborhoods. The second story doesn’t treat that data as an intransigent given; it critiques the data, asks that data how and why it came to be. As I’ve argued, AI is capable of revealing our biases, and showing us where they are hidden. It gives us an opportunity to learn about and critique our own institutions. If you don’t look critically at the data, its origins, and its stories (something that’s not a part of most computer science curricula), you’re likely to institutionalize the bias embedded in the data behind a wall of mathwashing.
There are plenty of situations in which that critique is needed. Here’s one: researchers looking at ride data from Chicago’s public data portal found that the dynamic pricing algorithms used by ride-hailing services (such as Uber and Lyft) charged more for rides to and from low-income, nonwhite areas. This effect might not have been discovered without machine learning. It means that it’s time to audit the services themselves, and find out exactly why their algorithms behave this way. And it’s an opportunity to learn what stories the data is telling us.
The real issue is which of those stories we choose to tell. I use the word “we” because the data doesn’t tell a story on its own, any more than a pixelated image of President Obama becomes a White man on its own. Someone chooses what story to tell; someone releases the software; and that someone is a person, not an algorithm. So if we really want to get to the bottom of the upsampling problem with PULSE, we need to be looking at people in addition to training data. If PULSE needed more images of Black people in its training set, why didn’t it have them? And why are we not surprised that these issues show up all the time, in applications ranging from COMPAS to the Google app that tagged Black people as gorillas?
That’s really a question about the teams of people who are creating and testing this software. They are predominantly White and male. I admit that if I wrote a program that upsampled images, it might not occur to me to test a Black person’s face. Or to test whether jail sentences for White and Black people are comparable. Or to test whether a real estate application will recommend that Black people consider buying homes in largely White neighborhoods. These not-so-microaggressions are the substance from which greater abuses of power are made. And we’re more likely to discover those microaggressions in time to stop them if the teams developing the software include people with Black and Brown faces, as well as White ones.
The problem isn’t limited to building teams that realize we need different training data, or that understand the need for testing against different kinds of bias. We also need teams that can think about what applications should and shouldn’t be built. Machine learning is complicit in many power structures. Andrew Ng’s newsletter, The Batch, gives payday lending as an example. An application might compute the optimal interest rate to charge any customer, and that app might easily be “fair” by some mathematical standard–although even that is problematic. But the industry itself exists to take advantage of vulnerable, low-income people. In this situation, it is impossible for an algorithm—even a “fair” one—to be fair. Likewise, given the current power structures, along with the possibility for abuse, it is very difficult to imagine a face recognition application, no matter how accurate, that isn’t subject to abuse. Fairness isn’t a mathematical construct that can be embodied by an algorithm; it has everything to do with the systems in which the algorithm is embedded. The algorithms used to identify faces can also be used to identify bird species, detect diseased tomatoes on a farm, and the like. The ethical problem isn’t the algorithm, it’s the context and the power structures, and those are issues that most software teams aren’t used to thinking about.
There’s an even better example close at hand: PULSE itself. Obscuring faces by replacing them with low-resolution, pixelated images is a classic way of protecting the identity of the person in the photograph. It’s something people do to protect themselves–a particularly important issue in these days of incognito armies. Software like PULSE, even (especially) if it is trained correctly, undoes individuals’ efforts to protect their own privacy. It tips the power relationship even further in the direction of the empowered. And an application like Stanford’s #BlackLivesMatter PrivacyBot tips the balance back the other way.
There are many ways to address these issues of bias, fairness, and power, but they all start with building inclusive teams, and with taking a step back to look at the bigger issues involved. You are more likely to detect bias if there are people on the team who have been victims of bias. You are more likely to think about the abuse of power if the team includes people who have been abused by power. And, as I’ve argued elsewhere, the job of “programming” is becoming less about writing code, and more about understanding the nature of the problem to be solved. In the future, machines will write a lot of code for us. Our task will be deciding what that software should do—not putting our heads down and grinding out lines of code. And that task isn’t going to go well if our teams are monochromatic.