Chapter 4. Fairness Pre-Processing

As discussed in the previous chapter, fairness can affect three stages of the data modeling pipeline. This chapter focuses on the earliest stage, adjusting the way that data is translated into inputs for a machine learning training process, also called pre-processing the data.

The advantages of pre-processing a data set are numerous. For starters, many regard this as the most flexible fairness intervention, because if done well, it can prevent downstream misuse or carelessness leading to discrimination. If the discrimination is removed from the data, there is less of a concern that naive or careless downstream users could go wrong. Additionally, some methods for pre-processing a data set are more intuitive and inspectable than are methods that act during model training (i.e., in-processing).

Because pre-processing is the earliest opportunity for intervening in the data modeling process,1 pre-processing offers the most opportunities for downstream metrics. When pre-processing is the fairness intervention used in the data modeling pipeline,2 fairness metrics can be applied at different stages along the pipeline. For example, we can separately measure both how the pre-processing reduces discrimination in the data and how the pre-processing affects potentially discriminatory outputs of the model trained on the data set.

Because fairness in machine learning remains a relatively young field without a clearly established canon, and because fairness itself is a slippery subject that will always have different definitions in different contexts, there is no single way to pre-process data. What’s more, there is no single way to pre-process data that has been found to be the “best” even with a particular fairness metric in mind. Different interventions perform differently depending on the data set. I’ll start by covering a few pre-processing methods that have gained a following, as indicated by citations to academic papers and by inclusion in a popular AI fairness module, AIF360. In the next three chapters, we will maintain our ties to the AIF360 selection while also sometimes covering methods not included in that module. This module is a good way to start working on issues related to machine learning fairness, so it’s worth your while to gain familiarity with this open source code base. You might even consider contributing to it, as it’s a young project with room to grow.

Simple Pre-Processing Methods

You would likely come up with two simple pre-processing methods on your own if asked to devise a method for pre-processing data to reduce discrimination. One possibility, you might suggest, is to delete any data associated with a sensitive or protected attribute. Don’t want to discriminate by gender? Fine, delete the gender information. Another possibility could be to “right the wrongs” in the data set by relabeling data. Can you identify a specific job candidate who seems to have suffered from racial discrimination in not being hired? Change their label—that is, change their outcome to what it would fairly be if consistent with their merit. I briefly discuss these options before proceeding to a discussion of more-sophisticated methods for pre-processing data to reduce discrimination.

Suppression: The Baseline

In some academic work, suppressing not only sensitive attributes but also attributes that are highly correlated with these sensitive attributes has been suggested as a baseline model for removing bias. This can assist with some cases drawn from recent legal history, such as the use of geography as a proxy for race (e.g., redlining) or the use of a high school diploma as an attribute that was highly indicative of race but was a genuine work requirement.3 We’ll give an example of suppression in the following code demonstrations to show one example of suppression and its performance relative to other techniques on a specific data set.

Massaging the Data Set: Relabeling

Two distinct propositions are available for relabeling the data. One way to relabel the data is to identify likely unfair/discriminatory decisions and correct these by changing the outcome to what ought to have happened. The other way is to change the labeled sensitive class rather than the outcome, and this can be done either randomly or in a systematic way to correct discrimination. I don’t pursue this second option because such sensitive information is not usually fed into a training algorithm directly, so relabeling it is unlikely to change the outcome. This is because in most cases of importance, considering such protected categories is illegal. Even when inclusion of sensitive variables isn’t illegal, it would often appear motivated by unethical considerations. When considering labels is neither illegal nor unethical (such as medical algorithms that recognize difference incidences of disease in different racial categories), you also wouldn’t want to relabel the sensitive categories, since in such cases they provide critical information that would help those affected by the algorithm, particularly medical patients.

Relabeling outcome data, where we could imagine that data adjustments could attempt to remove some of the unfairness of the real world to teach an algorithm fairer outcomes, could be done in a number of ways. But it’s important to recognize that the judgments as to which points deserve to be relabeled could easily be incomplete or themselves reflect different forms of bias. For example, imagine a society in which racial group A had been historically favored and racial group B had been historically disfavored. Let’s further imagine that some kind of racial segregation had just ended, finally permitting students from racial group B to attend a university from which they had previously been barred. When we look at historical data for that university, we might find that students from racial group A historically succeeded in college if they had a certain set of attributes. One way of trying to make a fair data set could be to take past examples of students from racial group B, all of whom had previously been denied entry, and instead find their nearest neighbor from racial group A, and mark them to have the same outcome.

No doubt, this would result in a substantially fairer data set than the one reflecting true history. After all, it would at least include the possibility that students from group B could be admitted to the university. However, it’s not clear that this data set would be especially fair. For example, the attributes that marked successful students from group A might not be present in any students from group B because of other associated disadvantages. Imagine that in a privileged group of students, organized sports are readily available in terms of financial and social resources to enable participation. Perhaps for this population, participation in such sports can be good proof of opportunities to develop self-discipline. Perhaps it is also the case that those privileged students who have such an opportunity and fail to take it tend to be less self-disciplined than the others, although this itself would still need to reflect many of the other reasons not to participate in an organized sport, such as health problems, or the pursuit of other interests that require self-discipline. In broad strokes, for such a population, participation in an organized sport could be a useful indicator of self-discipline, and failure to participate might call for inquiry into whether the student could give some other proof of having developed good self-discipline. A model built for such a population could consider participation in such sports as a positive factor and, hopefully, leave open the options for other positive factors to allow for cases where such participation was not possible or desirable.

On the other hand, we can imagine that for an unprivileged group—because of financial or social or structural reasons—organized sports are not readily available. This does not mean that students in this pool lack self-discipline, but instead that they are unlikely to have had the opportunity to develop self-discipline via organized sports. For this population, it might not even make sense to include such a variable if participation is a rare opportunity, rather than a common and accessible proxy. So deploying the model built for the privileged group on this unprivileged group would be unfair and inaccurate.

We could go through a variety of situations to show ways this could be unfair. It points to the possibility that human discretion and constant learning about the specific domain would continue to be important, even if we worried that humans were the ultimate source of racial bias. After all, humans could at least reason about things such as the lack of opportunity to join a sports team, whereas existing algorithms cannot. No doubt, coming up with a fair admissions standard would be a work in progress and would also be subject to different ideas about what constituted fair. As has been stated time and again, the technical literature provides no clear answer because this is ultimately a philosophical question that has more than one reasonable answer. This example is chosen to show that even after we discussed fairness metrics in the previous chapter, the appropriateness of which one to deploy and when, and what would be acceptable in making data meet a fairness criterion, remains very much a complicated philosophical question. This is particularly true in the face of a history of unfairness rather than a single unfair decision that needs to be fixed in a single data set.

I won’t demonstrate the example of relabeling the data because this is described elsewhere and because it seems unlikely to be a successful way to move forward with data fairness in realistic scenarios. In general, it will be difficult to generate organizational willingness to relabel data.

AIF360 Pipeline

The AIF360 pipeline is one convenient option for learning and experimenting with fairness metrics and interventions. AIF360 is a convenient package for trying different fairness metrics and interventions. This is because it includes multiple pre-coded fairness interventions for pre-processing data based on widely cited, recent scholarship. It’s also because AIF360 comes with the most widely used data sets in the technical fairness literature. Here are two we will be working with:

COMPAS data set

This is the data set built by ProPublica and associated with its 2016 story on racial bias in the COMPAS program’s recidivism scoring algorithm as assessed by a statistical parity metric on false positives or false negatives. This data set includes around 6,000 data points, with such sensitive attributes as race and sex.

German credit data set

This widely cited data set contains one thousand decisions regarding whether to extend credit to an applicant. It became public in 1994 and includes around 20 inputs and sensitive attributes including sex, age, and citizenship status.

AIF360 is useful because it provides an easy and accessible interface for loading data and running discrimination metrics on that data.

Loading the Data

You can see how easy it is to load the data with a simple example:

gd = GermanDataset()

If you then print the data set, you can see it’s in a nice, digestible format:

               instance weights features

                                   month credit_amount
instance names
0                           1.0      6.0        1169.0
1                           1.0     48.0        5951.0
2                           1.0     12.0        2096.0
3                           1.0     42.0        7882.0
4                           1.0     24.0        4870.0
...                         ...      ...           ...

               investment_as_income_percentage residence_since
instance names
0                                          4.0             4.0
1                                          2.0             2.0
2                                          2.0             3.0
3                                          2.0             4.0
4                                          3.0             4.0
...                                        ...             ...

               protected attribute
                               age number_of_credits people_liable_for
instance names
0                              1.0               2.0               1.0
1                              0.0               1.0               1.0
2                              1.0               1.0               2.0
3                              1.0               1.0               2.0
4                              1.0               2.0               2.0
...                            ...               ...               ...
               protected attribute             ...
                               sex status=A11  ... housing=A153
instance names                                 ...
0                              1.0        1.0  ...          0.0
1                              0.0        0.0  ...          0.0
2                              1.0        0.0  ...          0.0
3                              1.0        1.0  ...          1.0
4                              1.0        1.0  ...          1.0
...                            ...        ...  ...          ...

               skill_level=A171 skill_level=A172 skill_level=A173
instance names
0                           0.0              0.0              1.0
1                           0.0              0.0              1.0
2                           0.0              1.0              0.0
3                           0.0              0.0              1.0
4                           0.0              0.0              1.0
...                         ...              ...              ...

               skill_level=A174 telephone=A191 telephone=A192
instance names
0                           0.0            0.0            1.0
1                           0.0            1.0            0.0
2                           0.0            1.0            0.0
3                           0.0            1.0            0.0
4                           0.0            1.0            0.0
...                         ...            ...            ...


               foreign_worker=A201 foreign_worker=A202
instance names
0                              1.0                 0.0    1.0
1                              1.0                 0.0    2.0
2                              1.0                 0.0    1.0
3                              1.0                 0.0    1.0
4                              1.0                 0.0    2.0
...                            ...                 ...    ...
[1000 rows x 60 columns]

Because all the categorical variables have been converted to binary columns, we have quite a wide data set. The protected attributes of age and sex in the data set are conveniently labeled as such. If you haven’t noticed that, backtrack and look at the labeling just above the column name at the top of the output. This won’t be the case when you create AIF360 data sets with your own data, but it will be the case when you use one of the existing data sets.

The fact that this data comes with the AIF360 package is useful not only because it saves you the trouble of having to download it, but also because it saves you some trouble relating to pre-processing the data. For example, the German credit data set comes with its own idiosyncratic labeling, such as “A201” for foreign worker status, whereas AIF360 provides some utility methods to convert these sorts of values into typical numerical labels. Here’s how to access these utility functions:

from aif360.algorithms.preprocessing.optim_preproc_helpers.data_preproc_functions
            import load_preproc_data_german
priv_group   = [{'sex': 1}]
unpriv_group = [{'sex': 0}]
               ## utility function to collapse categories
               ## according to details of dataset
preproc_gd   = load_preproc_data_german(['sex'])

This prints out a simplified data set that can be more useful to work with and that has been used throughout the AIF360 tutorials:

               instance weights features
                                         protected attribute
                                     age                 sex
instance names
0                           1.0      1.0                 1.0
1                           1.0      0.0                 0.0
2                           1.0      1.0                 1.0
3                           1.0      1.0                 1.0
4                           1.0      1.0                 1.0
...                         ...      ...                 ...

               credit_history=Delay credit_history=None/Paid
instance names
0                               0.0                      0.0
1                               0.0                      1.0
2                               0.0                      0.0
3                               0.0                      1.0
4                               1.0                      0.0
...                             ...                      ...
               credit_history=Other savings=500+ savings=<500
instance names
0                               1.0          0.0          0.0
1                               0.0          0.0          1.0
2                               1.0          0.0          1.0
3                               0.0          0.0          1.0
4                               0.0          0.0          1.0
...                             ...          ...          ...

               savings=Unknown/None employment=1-4 years employment=4+ years
instance names
0                               1.0                  0.0                 1.0
1                               0.0                  1.0                 0.0
2                               0.0                  0.0                 1.0
3                               0.0                  0.0                 1.0
4                               0.0                  1.0                 0.0
...                             ...                  ...                 ...


instance names
0                                0.0    1.0
1                                0.0    2.0
2                                0.0    1.0
3                                0.0    1.0
4                                0.0    2.0
...                              ...    ...

[1000 rows x 13 columns]

This need not be a canonical data set, but it can be a way to start with a cleaner and well-sized and shaped set of inputs. You can see the decisions made in pre-processing by looking at the source code for this utility function:

def load_preproc_data_german(protected_attributes=None):
    Load and pre-process german credit dataset.
        protected_attributes(list or None): If None use all possible protected
            attributes, else subset the protected attributes to the list.
        GermanDataset: An instance of GermanDataset with required pre-processing.
    def custom_preprocessing(df):
        """ Custom pre-processing for German Credit Data

        def group_credit_hist(x):
            if x in ['A30', 'A31', 'A32']:
                return 'None/Paid'
            elif x == 'A33':
                return 'Delay'
            elif x == 'A34':
                return 'Other'
                return 'NA'

        def group_employ(x):
            if x == 'A71':
                return 'Unemployed'
            elif x in ['A72', 'A73']:
                return '1-4 years'
            elif x in ['A74', 'A75']:
                return '4+ years'
                return 'NA'

        def group_savings(x):
            if x in ['A61', 'A62']:
                return '<500'
            elif x in ['A63', 'A64']:
                return '500+'
            elif x == 'A65':
                return 'Unknown/None'
                return 'NA'

        def group_status(x):
            if x in ['A11', 'A12']:
                return '<200'
            elif x in ['A13']:
                return '200+'
            elif x == 'A14':
                return 'None'
                return 'NA'

        status_map = {'A91': 1.0, 'A93': 1.0, 'A94': 1.0,
                    'A92': 0.0, 'A95': 0.0}
        df['sex'] = df['personal_status'].replace(status_map)

        # group credit history, savings, and employment
        df['credit_history'] = df['credit_history'].apply(lambda x:
        df['savings'] = df['savings'].apply(lambda x: group_savings(x))
        df['employment'] = df['employment'].apply(lambda x: group_employ(x))
        df['age'] = df['age'].apply(lambda x: np.float(x >= 25))
        df['status'] = df['status'].apply(lambda x: group_status(x))

        return df

    # Feature partitions
    XD_features = ['credit_history', 'savings', 'employment', 'sex', 'age']
    D_features = ['sex', 'age'] if protected_attributes is
      None else protected_attributes
    Y_features = ['credit']
    X_features = list(set(XD_features)-set(D_features))
    categorical_features = ['credit_history', 'savings', 'employment']

    # privileged classes
    all_privileged_classes = {"sex": [1.0],
                              "age": [1.0]}

    # protected attribute maps
    all_protected_attribute_maps = {"sex": {1.0: 'Male', 0.0: 'Female'},
                                    "age": {1.0: 'Old', 0.0: 'Young'}}

    return GermanDataset(
        privileged_classes=[all_privileged_classes[x] for x in D_features],
        metadata={ 'label_maps': [{1.0: 'Good Credit', 2.0: 'Bad Credit'}],
                   'protected_attribute_maps': [all_protected_attribute_maps[x]
                                for x in D_features]},

This utility function makes use of the standard GermanDataset initializer but provides inputs beyond the defaults for the many arguments the initializer takes. You can use this as a baseline from which to write your own utility function if you would like to add more input features or perhaps change the labels on some of the inputs. Utility functions are provided for all the standard data sets. Also, helpfully, several of these loading functions are tied directly to recent NeurIPS fairness research papers,5 so that this can be a good source of code if you are looking to replicate or extend existing state-of-the-art results and techniques.

Fairness Metrics

Once we have loaded a data set, we can also assess the data itself even before training models, to determine the extent of unfairness we think is represented in the data set. As described in Chapter 3, the technical fairness literature discusses and optimizes multiple fairness metrics, and it’s unlikely that a metric can ever cover every situation. AIF360 recognizes this and provides numerous options. A few of them are demonstrated in this section, showing how easy it is to get a quick report on a data set, either a standard data set that comes with AIF360 or a novel one you are loading to pre-process. I start by creating a BinaryLabelDatasetMetric object, which then includes a number of convenience methods, as shown here:

# Metric for the original dataset
gd_metrics  = BinaryLabelDatasetMetric(preproc_gd,
                                             unprivileged_groups = unpriv_group,
                                             privileged_groups   = priv_group)

We can then compute a variety of metrics as shown here:

>>> gd_metrics.consistency()
>>> gd_metrics.disparate_impact()
>>> gd_metrics.statistical_parity_difference()

These metrics are defined in the Chapter 3. They are also defined with mathematical notation in the AIF360 documentation. Note that for reading this and similar documentation, you should have taken a basic probability course that would expose you to the notation generally used in probability and statistical definitions.

The US Census Data Set

For the pre-processing examples that follow, we will use a US Census data set often also known as the Adult data set. It is available via the UCI Machine Learning Repository as well as through the AIF360 module. This data set is used extensively in the fairness literature. We will be pre-processing it to make it fairer with respect to sex (male or female) when the outcome measured is income as a binary variable of ≤ $50,000 or > $500,000. We can load the data set and see its initial fairness indicators with the following code:

priv_group   = [{'sex': 1}]
unpriv_group = [{'sex': 0}]
census_data  = load_preproc_data_adult(['sex'])

dset_raw_trn, dset_raw_vld,
              dset_raw_tst = split_data_trn_vld_tst(census_data,

## calculate the metric of interest
metric_raw_trn = BinaryLabelDatasetMetric(dset_raw_trn,
                                         unprivileged_groups = unpriv_group,
                                         privileged_groups   = priv_group)
print("Difference in mean outcomes = %f" %
print("Disparate impact = %f" %

This leads to the following output:

Difference in mean outcomes = -0.19
Disparate impact = 0.36

In case you have forgotten from Chapter 3, the difference in mean outcomes represents the difference between the mean outcome for the favored group and the mean outcome for the disfavored group. Ideally, this number should be as close to 0 as possible. Disparate impact refers to the ratio of the success rate for the disfavored or unprivileged group to the success rate of the favored or privileged group. Ideally, this number too should be as close to 1 as possible, and as discussed in Chapter 3, values below 0.8 would be, in the domain of employment law, grounds for an investigation into hiring practices by the Equal Employment Opportunity Commission.

Note that the split was calculated using a class method convenience function for dividing up AIF360 data set classes for training, validation, and testing of models:

def split_data_trn_vld_tst(data_raw, privileged_groups, unprivileged_groups):
    dset_raw_trn, dset_raw_vt = data_raw.split([0.7], shuffle=True)
    dset_raw_vld, dset_raw_tst = dset_raw_vt.split([0.5], shuffle=True)

    return dset_raw_trn, dset_raw_vld, dset_raw_tst

Let’s also examine what this data set looks like so we have an idea of what we’re working with through the rest of this chapter:

               instance weights features
                                         protected attribute
                                    race                 sex Age (decade)=10
instance names
0                           1.0      1.0                 1.0             0.0
1                           1.0      1.0                 1.0             0.0
2                           1.0      1.0                 1.0             0.0
3                           1.0      0.0                 1.0             0.0
4                           1.0      0.0                 0.0             0.0
...                         ...      ...                 ...             ...

               Age (decade)=20 Age (decade)=30 Age (decade)=40
instance names
0                          0.0             1.0             0.0
1                          0.0             0.0             0.0
2                          0.0             1.0             0.0
3                          0.0             0.0             0.0
4                          1.0             0.0             0.0
...                        ...             ...             ...

               Age (decade)=50 Age (decade)=60 Age (decade)=>=70
instance names
0                          0.0             0.0               0.0
1                          1.0             0.0               0.0
2                          0.0             0.0               0.0
3                          1.0             0.0               0.0
4                          0.0             0.0               0.0
...                        ...             ...               ...

               Education Years=6 Education Years=7 Education Years=8
instance names
0                            0.0               0.0               0.0
1                            0.0               0.0               0.0
2                            0.0               0.0               0.0
3                            0.0               1.0               0.0
4                            0.0               0.0               0.0
...                          ...               ...               ...

               Education Years=9 Education Years=10 Education Years=11
instance names
0                            0.0                0.0                0.0
1                            0.0                0.0                0.0
2                            1.0                0.0                0.0
3                            0.0                0.0                0.0
4                            0.0                0.0                0.0
...                          ...                ...                ...

               Education Years=12 Education Years=<6 Education Years=>12
instance names
0                             0.0                0.0                 1.0
1                             0.0                0.0                 1.0
2                             0.0                0.0                 0.0
3                             0.0                0.0                 0.0
4                             0.0                0.0                 1.0
...                           ...                ...                 ...


instance names
0                 0.0
1                 0.0
2                 0.0
3                 0.0
4                 0.0
...               ...

[48842 rows x 20 columns]

You can see that protected attributes of race and age are listed first and that age and education are broken into broad categorical variables, such as whether someone’s age puts them in their thirties, or whether someone has more than 12 years of education. You can also see that a model we could build with this data set would operate on the theory that income should be predicted by education and age, and that if we were to apply fairness metrics to this model, we would have the operating assumption that race and age should not be predictive of income, given education and age, even though we know this would likely not be the case, at least in the 21st-century US context (or any previous century in US history, tragically).


Previously, I briefly mentioned suppression—that is, removing the explicit information about membership in a protected category and building a model that ignores this input. As described previously, this is problematic because, in real-world data sets, membership in a protected category is invariably highly correlated with other covariates. For this reason, your default expectation should often be that removal of information about a protected category without other fairness interventions is unlikely to yield less-discriminatory results and may even yield more-discriminatory results than other fairness interventions that do incorporate information about membership in a protected group.

Fairness Performance Metrics

There is no generally accepted way of comparing methods, be it with respect to metrics or data sets. There are some canonical data sets on which fairness algorithms are generally demonstrated, but no theoretical or empirical research indicates that these are necessarily the best or most representative samples. So the observations made in one set of examples may not carry over to other data sets with respect to performance of a particular fairness intervention.

In the following code, the information about the protected category is dropped, and then a model is fitted. Finally, I calculate the same discrimination metrics as used in the previous example to compare the performance of a model where suppression is the only fairness intervention used:

def build_logit_model_suppression(dset_trn,

    scaler = StandardScaler()
    X_trn  = scaler.fit_transform(dset_trn.features[:, 2:]) 
    y_trn  = dset_trn.labels.ravel()
    w_trn  = dset_trn.instance_weights.ravel()

    lmod = LogisticRegression(), y_trn,
             sample_weight = w_trn) 

    dset_tst_pred = dset_tst.copy(deepcopy=True)
    X_tst = scaler.transform(dset_tst_pred.features[:, 2:]) 
    dset_tst_pred.labels = lmod.predict(X_tst)

    metric_tst = BinaryLabelDatasetMetric(dset_tst_pred, 
    print("Disparate impact is %0.2f (closer to 1 is better)" %
    print("Mean difference  is %0.2f (closer to 0 is better)" %

    return lmod, dset_tst_pred, metric_tst

# reproducibility

sup_lmod, sup_pred,
         sup_metric = build_logit_model_suppression(dset_raw_trn,

The columns containing information about sensitive attributes are dropped from the training data.

The model is fitted to the training data.

The columns containing information about sensitive attributes are dropped from the testing data to produce labels.

The original testing data, containing the sensitive labels, is used to measure metrics of fairness, specifically metrics related to discrimination.

The preceding code produces the following output:

Disparate impact is 0.60 (closer to 1 is better)
Mean difference  is -0.06 (closer to 0 is better)

In this case, this is an improvement compared to the raw data; that is, building a model that suppresses information about the protected category, for this data set and this chosen model, does not exacerbate the disparity in outcomes between groups. This result goes against the warning I have given previously. I have previously mentioned that machine learning researchers, and their predecessors in low-tech HR, admissions, and many other sorts of organizational offices, have found that ignoring protected attributes is not effective at enhancing fairness of outcomes, particularly when fairness is assessed on a group-oriented basis, such as statistical parity. However, this shows that there are exceptions, as with many rules of thumb that apply to data about humans.

If you have free time, you might consider giving some thought to why the advice doesn’t bear out here. For example, what does that imply about the structure of the data? Try generating synthetic data to see if you can prove out any theories you develop.


Reweighting the data set has a number of positive aspects to recommend it as compared to the suppression and relabeling techniques discussed earlier. Suppression is not guaranteed to move a model toward any notion of fairness. Moreover, throwing out data is rarely a good idea, and removing labels can even lead to more-discriminatory models. Also, to determine which labels you ought to change via relabeling, you need to have some underlying model of merit, but of course that is a difficult question. So, a method that enables you to avoid having to directly determine merit can be somewhat more automatic and helpful if you want to have justification for a fairness intervention that doesn’t require you to debate the way you define or measure merit. In such a case, reweighting the data is attractive because you are not defining merit, but merely ensuring that success and failure are equally weighted within all subgroups of the population, further lessening the risk that an ML product will find membership in a protected category, or variables proxying for such information as strong in the decision-making process. In particular, the risk that a model will develop a logic of [input indicating membership in a protected category] → merit will be diminished, and this will happen through a transparent process.

Relabeling involves taking an active state in the construction of the data set, and this can look quite a bit like social engineering. We are changing reality by acting as if things happened differently from the way they actually did. We can imagine in many scenarios that people would find this ideologically objectionable, a literal rewriting of history.

Reweighting the data set, on the other hand, does not entail such drastic acts of removing data or rewriting history. Instead, it takes aim at correcting past wrongs by weighting correct cases more heavily and weighting incorrect (that is, discriminatory) cases less. What’s more, reweighting can be much more principled in aiming at an ultimate fairness outcome, most particularly some predefined mathematical notion.

Note that adjusting sampling can be viewed as just another way to adjust weights, and can be appropriate for methods that do not directly take in a weighting vector.

How It Works

Reweighting works by postulating that a fair data set would show no conditional dependence of the outcome on a protected attribute. Hence, it postulates that P(group) membership in group G and an outcome (T) should be equal to P(group membership G) × P(outcome T); that is, group membership and outcome should be statistically independent. Reweighting adjusts the data point weights to make this so.

Note that this is such a simple procedure that it could be handcoded. The advantage of AIF360, however, is multifold:

  • Integration with a larger API, including the data sources and metrics covered in the preceding section.

  • Code from an open source project will have had many eyes on it and more opportunities for error correction than a DIY option.

  • It’s preferable to work within a code base where different processing options can be swapped in and out, rather than pre-committing all code to a particular solution simply because you can code it yourself.

Code Demonstration

The code for reweighting is relatively simple and can be found within the fit and transform methods available on the Reweighing class, which inherits from the more general Transformer class. The API is similar to that of sklearn, so those familiar with the popular sklearn library should adapt quickly to AIF360’s interface.

Let’s consider the fit code:

    def fit(self, dataset):
        """Compute the weights for reweighing the dataset.
            dataset (BinaryLabelDataset): Dataset containing true labels.
            Reweighing: Returns self.

        (priv_cond, unpriv_cond, fav_cond, unfav_cond,
        cond_p_fav, cond_p_unfav, cond_up_fav, cond_up_unfav) =

        n = np.sum(dataset.instance_weights, dtype=np.float64) 
        n_p = np.sum(dataset.instance_weights[priv_cond], dtype=np.float64)
        n_up = np.sum(dataset.instance_weights[unpriv_cond], dtype=np.float64)
        n_fav = np.sum(dataset.instance_weights[fav_cond], dtype=np.float64)
        n_unfav = np.sum(dataset.instance_weights[unfav_cond], dtype=np.float64)

        n_p_fav = np.sum(dataset.instance_weights[cond_p_fav], dtype=np.float64) 
        n_p_unfav = np.sum(dataset.instance_weights[cond_p_unfav],
        n_up_fav = np.sum(dataset.instance_weights[cond_up_fav],
        n_up_unfav = np.sum(dataset.instance_weights[cond_up_unfav],

        # reweighing weights
        self.w_p_fav = n_fav*n_p / (n*n_p_fav) 
        self.w_p_unfav = n_unfav*n_p / (n*n_p_unfav)
        self.w_up_fav = n_fav*n_up / (n*n_up_fav)
        self.w_up_unfav = n_unfav*n_up / (n*n_up_unfav)

        return self

self._obtain_conditionings refers to a private class method used to prepare logic vectors indicating which data points fall within which kind of condition. This includes both simple conditions (are they in the privileged set?) and combination conditions (are they in the privileged set and did they receive a favorable outcome?).

These lines of np.sum count the data points, adjusted by any custom nonuniform weights that have been supplied, correspond to the simple binary conditions of privileged or unprivileged and favorable outcome or unfavorable outcome.

These lines of np.sum count the data points, adjusted by any custom nonuniform weights that have been supplied, correspond to the combination conditions that include both group membership and outcome.

This is where the reweightings are calculated. In each case, dividing the numerator by n represents the rate of a particular combination of group membership and outcome that should occur if group membership and outcome were statistically independent. The other portion of the denominators represents dividing by the actual rate, which presumably represents some form of bias. So these weights act to replace the empirical but nonindependent rate with a rate that would be consistent with statistical independence of outcome and group membership.

Once these weights have been calculated, the transform method simply applies these weights to each individual data point. We do not reproduce that code for simplicity, but you can easily read it in the project’s GitHub repository.

Thanks to the handy packaging of the AIF360 module, we can then perform the reweighting with a few lines of code:

## transform the data set
RW = Reweighing(unprivileged_groups = unpriv_group,
                privileged_groups   = priv_group)
dset_rewgt_trn = RW.transform(dset_raw_trn)

## calculate the metric of interest
metric_rewgt_trn = BinaryLabelDatasetMetric(dset_rewgt_trn,
                                         unprivileged_groups = unpriv_group,
                                         privileged_groups   = priv_group)
print("Difference in mean outcomes = %f" %
print("Disparate impact = %f" %

This yields an output that looks perfect, in the sense that any statistical measure of fairness will now show perfect fairness:

Difference in mean outcomes = -0.0 ## 0 is desirable to minimize
difference between groups
Disparate impact = 1.0 ## 1 is desirable as this is a ratio of the
groups rather than a difference

Now we have a data set in which we have reweighted points to make group-oriented fairness metrics come out even. We have pre-processed the data not by changing labels or changing the vector space in which we represent inputs, but merely by adjusting the weighting of data points. The weightings can be one of four values, one for each combination of group membership (favored group or disfavored group) and outcome (favorable or unfavorable). We can verify that we have only four weightings as follows:

>>> set(dset_rewgt_trn.instance_weights)
{0.7868545496506331, 0.853796632203106, 1.0923452329973244, 2.2157900832376507}

Not surprisingly, two of the instance weights are < 1 (likely those with unfavorable outcomes for the unprivileged group and favorable outcomes for the privileged group) and two of the instance weights are > 1 (likely those with favorable outcomes for the unprivileged group and unfavorable outcomes for the privileged group).

The test of the method, however, is not whether it gives sensible weightings but whether the pre-processed data, when passed through a model training process, leads to an end product that is fairer than a model trained on unprocessed data. In this chapter we will train a logistic regression model and compare the model trained on the unprocessed data to the model trained on the reweighted data. As a reminder, I emphasized in preceding chapters that no single metric would establish a model as fair or not fair. Here we will use two metrics to assess models, consistency (a measure of individual fairness) and disparate impact (a measure of group fairness). These were defined in Chapter 3’s list of fairness metrics.

We will train our model on the raw data only once, and record its metrics. Keep these in mind when we look at other examples. First, let’s introduce a convenience function that wraps all our training and reporting functionality:

def build_logit_model(dset_trn,

    scaler = StandardScaler()
    X_trn  = scaler.fit_transform(dset_trn.features)
    y_trn  = dset_trn.labels.ravel()
    w_trn  = dset_trn.instance_weights.ravel()

    lmod = LogisticRegression(), y_trn,
             sample_weight = w_trn)

    dset_tst_pred = dset_tst.copy(deepcopy=True)
    X_tst = scaler.transform(dset_tst_pred.features)
    y_tst = dset_tst_pred.labels
    dset_tst_pred.scores = lmod.predict_proba(X_tst)[:, 0].reshape(-1,1) 

    fav_inds = np.where(lmod.predict(X_tst) == dset_trn.favorable_label)[0] 
    dset_tst_pred.labels[fav_inds] = dset_tst_pred.favorable_label
    dset_tst_pred.labels[~fav_inds] = dset_tst_pred.unfavorable_label

    metric_tst = ClassificationMetric(dset_tst, dset_tst_pred,
                                    unprivileged_groups, privileged_groups) 
    print("Consistency is      %f (closer to 1 is better)"
           % metric_tst.consistency())
    print("Disparate impact is %f (closer to 1 is better)" %

    return lmod, dset_tst_pred, metric_tst

Use the logistic regression model fit on either the raw or pre-processed training data to generate predictions for our raw training data.

Determine which indices in the predictions correspond to the favorable and unfavorable label and add these to the prediction data set accordingly.

Feed the data to a handy metric class built to compare data and predicted labels for that data and output two useful fairness metrics.

Applying this first to train a model on raw data, we see the following metrics:

>>> ## raw training data
>>> raw_lmod, raw_pred, raw_metric = build_logit_model(dset_raw_trn,
Disparate impact is 0.00 (closer to 1 is better)
Mean difference  is -0.22 (closer to 0 is better)

Wow, this looks pretty bad. In fact, this shows how a situation that is already unfair as reflected in the underlying data can become even worse when a model is trained on this data.

Modeling Can Exacerbate Bias

If we recorded the same figures for the raw data set itself, rather than a model, we need not have the same figures for fairness measures. In fact, some papers even study what additional unfairness the act of modeling may introduce to worsen a biased data set. This has been studied, for example, in models that label images or produce images based on labels, whereby a disparity (say, in gender representation in the real world), is worsened in the compilation of a data set and then worsened still more in the training of an image recognition model. So even if at the end of your studies in fairness you were to conclude that you’d rather use minimal or no intervention (unlikely to lead to anything like a fair outcome in many situations), you might introduce bias or additional bias merely by the act of training a model in a non-fairness-aware manner.

If you come away with nothing more from this book, remember that models can exacerbate the disparities represented in unfair data sets.

We can also evaluate a model trained on the data that has been pre-processed using reweighting. We apply the same convenience method to the reweighted data and see the following output:

>>> ## fairness pre-processed data
rewgt_lmod, rewgt_pred,
          rewgt_metric = build_logit_model(dset_rewgt_trn,
Disparate impact is 0.66 (closer to 1 is better)
Mean difference  is -0.07 (closer to 0 is better)

Interestingly, this is quite a bit better! But this is still far from fair if we are defining fairness in terms of statistical parity, since the groups are far from equal. Note, however, that we do better this way than with mere suppression as assessed by the disparate impact metric.

Learning Fair Representations

Let’s now turn to a technique that involves a fundamentally different way of pre-processing data to be fair. We examine a method proposed in a paper called “Learning Fair Representations.”6 The techniques described in the preceding section largely focus on enhancing group fairness by enforcing statistical parity. However, the two more sophisticated techniques described in this section and the next recognize the tension between group fairness and individual fairness, and so aim to develop techniques that recognize the importance of both kinds.

How It Works

Learned fair representations aim for a middle ground between group fairness and individual fairness by turning fairness pre-processing into an optimization problem, where different terms in the optimization relate to group fairness and individual fairness. A third term represents a typical loss function and so relates to accuracy.7

So far I’ve described the goals of the optimization, but I have not discussed what is being optimized—namely, mathematical representations of the data. The idea behind transforming the data from the original inputs into an alternative representation is to minimize the amount of information regarding membership in a protected category that is present in the transformed representation, while maximizing all other information present in the original data. In the words of Zemel et al., they sought to obfuscate the protected category while encoding the other information.

There are many imaginable ways to do this. Learned fair representations do this in a way that maintains the data in the original vector space but collapses it down to a set of “prototypes” of the input vk and parameters that defined the mapping from the original input space to the representation space wk.

The optimization’s goal is then to learn the appropriate locations for the prototypes and the appropriate mappings from the inputs to the prototypes. Here I describe the overall expression to be optimized in broad strokes in order to avoid getting into the notational weeds. The authors of the paper seek to optimize an equation with hyperparameters for the relative weightings for expressions corresponding to group fairness, individual fairness, and accuracy. The authors point out that these hyperparameters can be tweaked either according to domain knowledge or according to an overall performance metric, and they optimize to two fairness-aware performance metrics: (1) minimizing discrimination and (2) maximizing the difference between accuracy (desired to be higher) and discrimination (desired to be lower).

As with the choice of what fairness metric to choose, the balance of decisions made in an optimization problem still involves fairness norms and ultimately still relies on you, the operator, to determine the appropriate hyperparameters to build a fair pre-processed data set. These decisions will likely vary depending on your domain knowledge relating to the societal and personal costs paid for different kinds of wrong decisions and how that might impact how you balance accuracy and antidiscrimination efforts.

Let’s consider two real-world use cases in which we might need to balance accuracy and antidiscrimination. First, consider building a medical treatment tool for brain cancer. I am completely fabricating this example and don’t imagine it is true, but imagine that for some reason the probability of successful treatment for men differed strongly from that probability for women, with the solution found after extensive ML modeling and training. We might be concerned about this from an antidiscrimination perspective (if we could find no biological explanation) and so insert an antidiscrimination penalty. However, given the life-or-death stakes, we would probably not want to put in a very strong antidiscrimination penalty. Most people would not find it fair to reduce discriminatory outcomes if it meant that more people died of brain cancer overall. In fact, how such a decision would be made and who would make that decision itself is a complicated ethical question that other entities, such as a hospital ethics review board, would be better positioned to make than would an ML engineer. However, the bottom line is that in this example most decision makers would likely include a small antidiscrimination penalty and put more weight on accuracy and obtaining good outcomes.

Consider as a second example university admissions. My 17-year-old self thought of this as a defining moment in life, where the “accuracy” in determining the “best” and “most deserving” student (obviously me, I would now say with irony) should override any concerns about group parity. Luckily, the society in which I live had already come to a wiser conclusion and at the time already recognized the need both to promote diversity in elite universities and to recognize the wider difficulties faced by some groups, and so had enacted affirmative action that promoted health and equality and equity. It’s also worth noting that “accuracy” is a misleading idea for purposes of university admissions because there are many competing visions of who the ideal university student is and what their most important attributes are, something 17 year olds and their parents can easily lose sight of.

Some might see this as a penalty to accuracy. (I would argue that even this notion is far from obviously true.)8 In this case, we could frame a calibration of an ML university admissions product as choosing a relatively large antidiscrimination penalty in our weighting of accuracy versus antidiscrimination prerogatives. This could partly result from recognizing that the benefits of equality in this case far outweigh whatever downsides there could be to slightly reduced “accuracy.” Increasing diversity and equality at these institutions has a tremendous effect in sending a clear pro-equality message to the world.

Learned fair representations demonstrate impressive results particularly with respect to minimizing discrimination. This was interesting when assessed on the performance of a model, while showing only a fairly small diminishment in accuracy as compared to models trained on the raw data. What’s more, Zemel et al. also showed that attempts to predict membership in a protected group from the pre-processed data showed relatively low accuracy, suggesting that much of the information regarding protected attributes had indeed been removed, as was desired. Choice of hyperparameters could even lead to greater suppression of the information, depending on the algorithm operator’s priorities.

Advantages of learned fair representations are numerous. First, they can serve to develop a tool that can be deployed for a data release, even supposing that downstream users will not be interested in enhancing fairness. Second, they result from a relatively simple optimization problem that can be run on a standard laptop for small or medium data sets. Third, they can be used for transfer learning; that is, even if the optimization was developed to protect a particular category with respect to a particular outcome, models can be built on the same representations to predict other outcomes in a way that also enhances downstream fairness of those models.

Code Demonstration

Since you have already seen an example of how the AIF360 pre-processing pipeline works, the following code should be relatively readable. Note, however, that now test data has to be transformed as well as training data. This is because the model is now trained on a different representation of the data to which the original inputs were mapped. That is, now all data has to be pre-processed, not just the training data, because the representation itself has changed.

An additional wrinkle is that, unlike reweighting, learned fair representations provide a way not to go all in. In particular, a threshold hyperparameter can be set to indicate how strongly the operator wants to remove all information that could be related to discrimination. This is important because, as the authors acknowledge, unfortunately membership in a protected group can sometimes be informative in predicting an outcome.9 For this reason, the algorithm does provide a way for the operator to tune the data transformation when there is a concern that learned fair representations may remove too much useful information while pursuing the nondiscrimination goal.

First, we fit the LFR object, as is standard with the AIF360 pipeline:

TR = LFR(unprivileged_groups = unpriv_group,
                               privileged_groups = priv_group)
TR =

dset_lfr_trn = TR.transform(dset_raw_trn, thresh = 0.5)
dset_lfr_trn = dset_raw_trn.align_data sets(dset_lfr_trn)

dset_lfr_tst = TR.transform(dset_raw_tst, thresh = 0.5)
dset_lfr_tst = dset_raw_trn.align_data sets(dset_lfr_tst)

metric_op = BinaryLabelDatasetMetric(dset_lfr_trn,
                                      unprivileged_groups = unpriv_group,
                                      privileged_groups   = priv_group)
print("Mean difference:  %0.2f" % metric_op.mean_difference())
print("Disparate impact: %0.2f" % metric_op.disparate_impact())

This produces the following output:

Mean difference:  -0.21
Disparate impact: 0.00

This doesn’t seem to do especially well if we look at the transformed data, but what if we use it to train a model and then apply that model to the raw data?

>>> lfr_lmod1, lfr_pred, lfr_metric = build_logit_model(dset_lfr_trn,
Disparate impact is 0.95 (closer to 1 is better)
Mean difference  is -0.02 (closer to 0 is better)

This gives outstanding performance!

Note, however, that we’re not really done with what has been presented. Normally, we would also want to tune the hyperparameters of the various weight settings of the three expressions that contribute to the objection function we seek to minimize: Lx, Ly, and Lz. For example, in the paper describing LFR, the authors indicated that they performed a grid search over potential hyperparameters for these terms and even included grid terms that would allow the weights to go to 0. So, in a real-world application you should not accept the default values of these (as we did by not setting them in our function call) but should instead plan for hyperparameter tuning. You should do the hyperparameter tuning in accord with a metric that can reflect your organization’s decisions as to how to balance fairness and accuracy. For example, in the paper, the authors trained both to minimize discrimination but also, separately, to maximize the difference between accuracy (which we want to maximize) and discrimination (which we want to minimize). Interestingly, both of these methods yielded similar performance metrics among a variety of data sets.

Optimized Data Transformations

Optimized pre-processing is a name for a specific technique to transform the data again in the same vector space as the original data but in a way that makes the data set fairer while also preserving as much nondiscriminatory information about the data as possible. This section describes it in detail and then demonstrates it by means of the AIF360 Python module.

How It Works

Our next method was proposed by Calmon et al. in a paper titled “Optimized Pre-Processing for Discrimination Prevention”.10 This paper defines a probabilistic remapping of the original inputs, information about protected attributes, and labels (X, D, Y) respective to a remapped set of inputs and labels ( x ^ , y ^ ). The remapping occurs in the original vector space of the inputs, meaning that the pre-processed data can still be read in the same form and with the same column labels. The goal of the remapping is to keep the processed data’s distribution (in the sense of probability density function) as close to the original data’s distribution as possible, subject to the following two constraints:

  • The dependence of the transformed outcome, y ^ , on group membership D, is below a preset threshold (pushing the data to model a fairer world where group membership in a protected class does not predict outcomes).

  • The difference in the distribution of (X, Y) and the transformed distribution ( x ^ , y ^ ) is not above some threshold for a given specific group D.

These conditions are enforced for all possible groups, not just for the disfavored group. This leads to a model in which data is adjusted by small amounts but in a way to ensure that data is mostly adjusted for just one group beyond a certain threshold and also to lead to a fairer outcome by removing dependencies between group membership and outcome.

How are these values to be set? Earlier I mentioned “some threshold” a number of times. This threshold is to be set by the person doing the data transformation. How you set the threshold will depend on your organization’s priorities, the quality of the original data, and the importance of the outcome. Perhaps for some data sets one set of fairness thresholds is appropriate even though it would not be appropriate for another data set. You might even consider using cross-validation to set the thresholds depending on the ultimate outcome you are seeking. In such a case, this would be a multistep process: (1) vary the hyperparameters of the data pre-processing, (2) train a model on the pre-processed data, (3) vary the hyperparameters and compare.

As with earlier discussions, I avoid getting into the weeds of notation, but the paper is quite accessible if you want more details.

Code Demonstration

The input parameters as needed are embodied in the following code, used to run the AIF360 implementation of Calmon et al.’s optimized pre-processing algorithm. These parameters indicate, among other things, a way of measuring distortion (distortion_fun)—that is, the way of calculating how different a data point is from a proposed perturbation to that data point, in a way that allows the data analyst to indicate which sort of perturbations would be acceptable and their relative merits. The parameters also indicate the various distortion categories (clist) as indicated by the return value of the distortion function and acceptable probability maximums for each of these categories occurring (dlist). Finally, a parameter defines the permitted upper level of deviation from a form of statistical parity (epsilon).

While the setup requires more work and settings from us compared to earlier methods described, the pipeline otherwise looks quite similar, as shown here:

optim_options = {
    "distortion_fun": get_distortion_adult,
    "epsilon": 0.05,
    "clist": [0.99, 1.99, 2.99],
    "dlist": [.1, 0.05, 0]

OP = OptimPreproc(OptTools, optim_options)

OP =

# Transform training data and align features
dset_op_trn = OP.transform(dset_raw_trn, transform_Y=True)
dset_op_trn = dset_raw_trn.align_data sets(dset_op_trn)

The distortion function refers to how we will measure the distance between the original probability and the new probability. epsilon refers to the first bullet point condition limiting the dependence of the outcome on group membership. The clist parameters refer to the constraints on the distortion metric, and the distortion metric is provided by the function get_distortion_adult, which is a utility function provided in AIF360.

Let’s take a look at that function:

def get_distortion_adult(vold, vnew):
    """Distortion function for the adult dataset. We set the distortion
    metric here. See section 4.3 in supplementary material of
    for an example
        Users can use this as a template to create other distortion functions.
        vold (dict) : {attr:value} with old values
        vnew (dict) : dictionary of the form {attr:value} with new values
        d (value) : distortion value

    # Define local functions to adjust education and age
    def adjustEdu(v):
        if v == '>12':
            return 13
        elif v == '<6':
            return 5
            return int(v)

    def adjustAge(a):
        if a == '>=70':
            return 70.0
            return float(a)

    def adjustInc(a):
        if a == "<=50K":
            return 0
        elif a == ">50K":
            return 1
            return int(a)

    # value that will be returned for events that should not occur
    bad_val = 3.0

    # Adjust education years
    eOld = adjustEdu(vold['Education Years'])
    eNew = adjustEdu(vnew['Education Years'])

    # Education cannot be lowered or increased in more than 1 year
    if (eNew < eOld) | (eNew > eOld+1):
        return bad_val

    # adjust age
    aOld = adjustAge(vold['Age (decade)'])
    aNew = adjustAge(vnew['Age (decade)'])

    # Age cannot be increased or decreased in more than a decade
    if np.abs(aOld-aNew) > 10.0:
        return bad_val

    # Penalty of 2 if age is decreased or increased
    if np.abs(aOld-aNew) > 0:
        return 2.0

    # Adjust income
    incOld = adjustInc(vold['Income Binary'])
    incNew = adjustInc(vnew['Income Binary'])

    # final penalty according to income
    if incOld > incNew:
        return 1.0
        return 0.0

This convenience function provides an implementation to match published research. This is very helpful for replicating existing research or even for better understanding how a research paper relates to the code that implements the research paper. The preceding code is a great example. Notice also that we can use this distortion metric to implement our own changes if we want to define distortion differently. The distortion is used to regulate the preferred or disallowed data transformations, so this is where we would adjust code to reflect how we think changes to education, age, and so on should be viewed, depending on the data set.

Before we run our convenience function to train a model, we also need to transform our test set. Unlike reweighting, which does not require a transformation of the test set, we do need to transform the test set here because the model itself is trained on a remapped version of the data. Luckily, we can consistently remap the data and can still apply this method in an online fashion because the transform is learned only once, for the training data:11

## Transform testing data
dset_op_tst = OP.transform(dset_raw_tst, transform_Y=True)
dset_op_tst = dset_raw_trn.align_data sets(dset_op_tst)

Once we have transformed the test data as well, we can assess the performance:

>>> ## fairness preprocessed data
>>> op_lmod, op_pred, op_metric = build_logit_model(dset_op_trn,
Disparate impact is 0.63 (closer to 1 is better)
Mean difference  is -0.06 (closer to 0 is better)

We see very good performance on the group fairness metric (disparate impact) but also good performance on the individual fairness metric (consistency). Note also that to the extent that we were unhappy with the balance of group and individual fairness metrics, we could adjust the impact parameters related to epsilon and clist to adjust how stringent the conditions related to group and individual fairness were during the optimization.

Let’s consider one final point in the optimized transform methodology. As mentioned, a convenience method to calculate the distortion of data points is provided. We are, however, free to adjust this when our domain knowledge or priorities are different from those of Calmon et al. This change of the distortion calculation can also impact the group and individual fairness results, so this is another way you might consider tweaking the performance of downstream models. One example to make the distortion metric less stringent and therefore offer a larger space over which to maximize group fairness (likely at the expense of individual fairness or accuracy, or both, since we now allow greater distortion of the data). I cut the code short for space, but the modified portions are included and highlighted in comments:

def get_distortion_adult2(vold, vnew):
   ### ... omitted code ... ###

    # value that will be returned for events that should not occur
    bad_val = 3.0

    # Adjust education years
    eOld = adjustEdu(vold['Education Years'])
    eNew = adjustEdu(vnew['Education Years'])

    # Education cannot be lowered or increased in more than 1 year
    if (eNew < eOld - 1) | (eNew > eOld + 1): ## LESS STRINGENT
        return bad_val
    # adjust age
    aOld = adjustAge(vold['Age (decade)'])
    aNew = adjustAge(vnew['Age (decade)'])

    # Age cannot be increased or decreased in more than a decade
    if np.abs(aOld-aNew) > 15.0: ## LESS STRINGENT
        return bad_val

    ### ... omitted code ... ###

We can see how straightforward modifications to existing code allow us to inject domain knowledge, institutional preferences, or differing norms into the code base. As long as you understand how a methodology works generally, you will, with some careful reading of open source code, discover ways to make the code your own.

Fairness Pre-Processing Checklist

  • Pre-processing appropriateness evaluation

    • Are there any legal or reporting requirements that might affect how or whether you can perform pre-processing as a fairness intervention?

      • How does fairness pre-processing interact with GDPR or CCPA requirements to show each data subject the data about them? Do you need to prepare the pre-processed data for inspection as well?

    • How will you document the changes made to the raw data by pre-processing?

    • Will you store the raw data, and how will you determine which use cases continue to use raw data as compared to which will use pre-processed data?

    • You need to plan at the start for incorporating the pre-processing into your live production settings. If not, pre-processing is not appropriate. Is pre-processing in production feasible, and can it be accomplished with a unified team and model to ensure consistency?

  • Choosing the form of pre-processing

    • If pre-processing is acceptable and feasible in your use case, you still need to choose the form of pre-processing.

    • Will you simply eliminate sensitive data (this can sometimes make an outcome less fair)?

    • Will you relabel data explicitly or simply find a fairer representation of the data?

  • Assessing performance of pre-processing

    • Choose important performance metrics before assessing your model. Performance metrics will include accuracy and other measures of model performance, including one or more metrics that measure fairness of outcomes.

    • Apply performance metrics to a validation set as you tune hyperparameters available in pre-processing methods.

  • Determining good balance of fairness and other performance metrics

    • Use a validation data set to assess potential trade-offs between the fairness metric and accuracy. You can even define the trade-off numerically and plot it across different pre-processing methods or different hyperparameters.

    • Discuss trade-offs with appropriate stakeholders; this should be a transparent and full documented part of the modeling process.

    • Your chosen hyperparameters and pre-processing method should not only reflect pre-processing validation data performance. You should also have qualitative and well-articulated reasons, expressed in human language rather than code, for the choices you are making, and these reasons should be viable ex ante—that is, even before you see that a model produced valid results.

    • Come up with a reasonable impact assessment to determine the downsides as well as the upsides of pre-processing. Will particular individuals or kinds of individuals be hurt by this? If so, are you confident the harm to these individuals is justified in pursuit of a greater total fairness in your data set? Can you balance individual and group fairness?

  • Rolling out your pre-processing to production

    • Create a schedule for consistently checking the performance of your fairness metrics over time.

      • Is there a chance pre-processing will be less important over time?

      • Can you do ex post impact assessments to see how your model might have affected individuals?

      • Consider a randomized A/B-test-style rollout of your fairness pre-processing so that you can make stronger assertions about the counterfactual scenario when assessing the impact of your fairness pre-processing interventions.

  • Keep an up-to-date list of what models are using pre-processing rather than raw data sets. This should be clearly documented. Dependencies should be strictly accounted for.

Concluding Remarks

If you were careful in working through the code and discussion in this chapter, you might have noticed that we trained the data by including the protected attributes. We did even better when training the data by including the protected attributes, than we did when we suppressed them.

However, one thing I haven’t discussed is whether the model makes sense in terms of a fairness intervention. The purpose of our model was to predict the income of people based on their basic demographic information. If we want to predict the real world, including the likely racial and gender biases that are part of that world, it might not make sense to have a fairness intervention. For example, if we are seeking to know candidates for low-income assistance programs, we’d want to know realistically who is hardest hit by the problems of low income.

On the other hand, if we are using income as a proxy for something that equates higher income with higher merit for downstream decision making, we would want an intervention because what we would really be interested in is who would likely have a high income in a fairer world than the one our data set models, and pre-processing is one attempt to answer this. For example, if we (probably wrongly) used high income as a proxy for leadership abilities or investment in one’s community or likely success in running for political office, we would certainly want to have a fairness intervention in our model.

1 Apart from ensuring that data itself is selected and collected as fairly as possible, as discussed in Chapter 3.

2 We will discuss the option of multiple interventions in subsequent chapters.

3 Griggs v. Duke Power Co., 401 U.S. 424 (1971).

4 Much political campaign work is based on inferring the political beliefs and behaviors of individuals from little more than gender, ethnic identity, and age, and many of these variables are highly correlated with geography. The week of the 2020 US presidential election prompted discussion of this implicitly; votes from different geographic regions were expected to favor one candidate or the other given their demography.

5 NeurIPS is one of the most prominent international conferences on machine learning, along with other large conferences such as KDD and ICML. Increasingly, such conferences include important submissions on fairness research, and even tutorials and workshops fully devoted to the topic of fairness. I strongly recommend looking at the video presentations and publications of such conferences to stay current on emerging topics in algorithmic fairness.

6 Zemel, Rich, et al. “Learning Fair Representations.” Paper presented at the Proceedings of the 30th International Conference on Machine Learning, Atlanta, Georgia, June 2013.

7 Yes, there can be accuracy even when a model has not been built and we are pre-processing, and this will be explained later.

8 This book is not a disquisition on affirmative action. I simply note that good arguments exist in favor of the idea that affirmative action could increase accuracy by better accounting for the real challenges that members of historically disfavored groups encounter for a variety of reasons, including but not limited to structural inequalities and implicit but unacknowledged biases that can, in turn, affect all other inputs to admissions decisions, such as subjective grading decisions, availability of extracurricular activities, etc.

9 As discussed in Chapter 3, even when this is true, it is still illegal to use this information.

10 Calmon, Flavio, et al. “Optimized Pre-Processing for Discrimination Prevention.” Paper presented at Neutral Information Processing Systems 2017, Long Beach, California, 2017.

11 For those who may have statistical qualms regarding the possibility that the test set’s distribution is different from the training set’s distribution, Zemel et al. provide bounds to show such a potential problem is most likely not a problem.

Get Practical Fairness now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.