Chapter 75. Data Science Does Not Need a Code of Ethics
Dave Cherry
This may seem like an odd title for an essay in a book about ethics and data science. But it’s true. Data science does not need a code of ethics. It needs something else (which I’ll reveal shortly).
Ethics is defined as a set of “moral principles that govern a person’s behavior or the conducting of an activity.” Building on that definition, morals are defined as “a lesson, especially concerning what is right or prudent, derived from a story, a piece of information, or an experience.”
Let’s dig into these definitions through the following components of data science: data, models/tools, and people.
Data is not a person. It is not alive in the sense that it can make decisions or exhibit behavior on its own. Therefore, one can conclude that data can be neither ethical nor unethical. It is just numbers, facts, attributes, and so on. Data can have biases within it. However, those biases are typically created by people, and the resulting data is still an accurate representation of what happened. Ethics should not be confused with bias.
Data science models and algorithms follow the same logic. Developed by humans, models have the ability to introduce bias into data. But again, the model itself is not capable of being ethical or unethical.
That leaves us with people. Data ...
Get 97 Things About Ethics Everyone in Data Science Should Know now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.