In a recent paper, Hila Gonen and Yoav Goldberg argue that methods for de-biasing language models aren’t effective; they make bias less apparent, but don’t actually remove it. De-biasing might even make bias more dangerous by hiding it, rather than leaving it out in the open. The toughest problems are often the ones you only think you’ve solved.
Language models are based on “word embeddings,” which are essentially lists of word combinations derived from human language. There are some techniques for removing bias from word embeddings. I won’t go into them in detail, but for the sake of argument, imagine taking the texts of every English language newspaper and replacing “he,” “she,” and other gender-specific words with “they” or “them.” (Real techniques are more sophisticated, of course.) What Gonen and Goldberg show is that words still cluster the same way: stereotypically female professions still turn up as closely related (their example is nurse, caregiver, receptionist, and teacher).
I’m not at all surprised at the result. Stereotypes go deeper than pronouns and other obvious indications of gender, and are deeply embedded in the way we use language. Do nurse, teacher, and caregiver cluster together because they’re all about “caring,” and do traditionally masculine professions cluster differently? I suspect the connections are much more complex, but something along those lines is probably happening. It’s not a problem if “caring” professions cluster together—but what about the connections between these words and other words?
Gonen and Goldberg point out that explicit male/female associations aren’t really the issue: “algorithmic discrimination is more likely to happen by associating one implicitly gendered term with other implicitly gendered terms, or by picking up on gender-specific regularities in the corpus by learning to condition on gender-biased words, and generalizing to other gender-biased words (i.e., a resume classifier that will learn to favor male over female candidates based on stereotypical cues in an existing—and biased—resume data set, despite being “oblivious” to gender).” That is, an AI that screens job applications for a “programming” position could be biased against women without knowing anything explicit about gender; it would just know that “programmer” clusters with certain words that happen to appear more often in resumes that come from men.
So let’s ask some other difficult questions. Would de-biasing language around race and ethnicity achieve the same (lack of) result? I would like to see that studied; Robyn Speer’s work, described in “How To Make a Racist AI Without Really Trying,” suggests that de-biasing for race is at least partially successful; though, Speer asks: “Can we stop worrying about algorithmic racism? No. Have we made the problem a lot smaller? Definitely.” I hope she’s right; I’m less convinced now than I was a few months ago. I can certainly imagine racial stereotypes persisting even after bias has been removed. What about anti-semitic language? What about other stereotypes? One researcher I know discovered an enterprise security application that was biased against salespeople, who were considered more likely to engage in risky behavior.
We’re uncovering biases that are basic to our use of language. It shouldn’t surprise anyone that these biases are built in to the way we communicate, and that they go fairly deep. We can remove gender as a factor in word embeddings, but that doesn’t help much. Turkish, for example, doesn’t have gendered pronouns, a fact that has revealed gender bias in automated translation, where gender-neutral Turkish sentences are translated as gender-specific English sentences. But no one would claim that gender bias doesn’t exist in Turkish; it’s just encoded differently. Likewise, we can remove race and ethnicity as a factor in word embeddings, but that, at best, leaves us with a smaller problem. Language is only a symptom of bigger issues. These biases are part of what we are, and these word associations, including the associations we’d prefer to disown, are part of what makes language work.
The problem we’re facing in natural language processing (as in any application of machine learning) is that fairness is aspirational and forward looking; data can only be historical, and therefore necessarily reflects the biases and prejudices of the past. Learning how to de-bias our applications is progress, but the only real solution is to become better people. That’s more easily said than done; it’s not clear that being more conscious about how we talk will remove these hidden biases, any more than rewriting “he” and “she” as “they.” And being too conscious of how we talk can easily lead to a constant state of self-censorship that makes conversation impossible, and specifically the kinds of conversations we need to make progress.
If there’s any one thing that will remove those biases, it is being more aware of how we act. Returning to Gonem and Goldberg’s study of professional stereotypes: the way to change those problematic word embeddings permanently isn’t to tweak the data, but to make hiring decisions that aren’t based on stereotypes, and to treat the people we hire fairly and with respect, regardless of gender (or race or ethnicity). If we act differently, our language will inevitably change to reflect our actions.
I am hopeful that machine learning will help us leave behind a biased and prejudiced past, and build a future that’s more fair and equitable. But machine learning won’t make prejudice disappear by forcing us to rely on data when the data itself is biased. And the myth that the apparent abstraction and mathematical rationality of machine learning is unbiased only lends the prestige of math and science to prejudice and stereotype. If machine learning is going to help, we’ll need to understand that progress is incremental, not absolute. Machine learning is a tool, not a magic wand, and it’s capable of being misused. It can hold a mirror to our biases, or it can hide them. Real progress relies on us, and the road forward will be neither simple nor easy.