CHAPTER 7Auditing a Responsible Data Science Project
In the previous chapter, we worked through the Justification, Compilation, and Preparation steps of the Responsible Data Science (RDS) framework using two examples: a classification task (recidivism prediction) and a regression task (prediction of crime rates). Beyond working through the traditional technical and logistical data science project steps such as data gathering, data cleaning, and feature engineering for each of these tasks, we also considered the broader societal context that often goes ignored within projects like these. Now, we transition to creating and auditing our own models for the classification example, keeping a careful eye out for any issues of unfairness that might manifest themselves throughout.
We begin this chapter with a deeper discussion of the tricky issue of fairness: how it's commonly defined, how it can be quantified, and the limitations of the present-day data science toolkit for addressing issues of fairness. Afterward, we work through the Modeling and Auditing steps of the RDS framework across the COMPAS classification example developed in the previous chapter as follows:
MODELING the data:
- First, achieve predictive performance that is useful for the modeling task and sufficiently better than a baseline featureless model.
- Then compare black-box and intrinsically interpretable models to quantify the “cost” of interpretability in predictive performance.
- Ensure that the final candidate model ...
Get Responsible Data Science now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.