Chapter 2. Fairness and Bias

In this chapter, we will dive into bias in machine learning models before defining key concepts in evaluation and mitigation and exploring several case studies from natural language processing and computer vision settings.


This chapter covers the topic of hate speech and includes graphic discussion of racism and sexism.

Before we discuss definitions of mathematical fairness, let’s first get an understanding of what bias and its consequences look like in the real world. Please note that when we talk about bias in this chapter, we refer to societal bias, rather than bias-variance trade-off in machine learning or inductive biases.


In confusing societal bias with the biases of neural networks, many people may ask if this is a problem that can be solved by setting the bias terms to 0.

It is, in fact, possible to train large models without bias terms in the dense kernels or layer norms.1 However, this does not solve the problem of societal bias, as the bias terms are not the only source of bias in the model.

These case studies serve two purposes. First, they show the potential consequences of lack of fairness in ML models and thus why it is important to focus on this topic. Second, they illustrate one of the main challenges of creating fair ML models: human systems and therefore data are unfair, and thus one challenge is building fair ML models from potentially unfair sources.

Case 1: Social Media

When users upload images to Twitter, the ...

Get Practicing Trustworthy Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.