Skip to Content
97 Things About Ethics Everyone in Data Science Should Know
book

97 Things About Ethics Everyone in Data Science Should Know

by Bill Franks
August 2020
Beginner
344 pages
10h 23m
English
O'Reilly Media, Inc.
Content preview from 97 Things About Ethics Everyone in Data Science Should Know

Chapter 33. Rethinking the “Get the Data” Step

Phil Bangayan

My key responsibility as a principal data scientist is creating accurate models, which involves getting appropriate data. This step of getting data occurs early in the data science process that was taught to me and all aspiring data scientists, today and going back to the late 1990s, in the form of CRISP-DM (cross-industry standard process for data mining). After practicing on both the client and vendor sides, I have learned that this step receives insufficient attention, opening up data scientists to traps when they do not understand where the data comes from, misuse data collected for a different purpose, or utilize proxy data in a possibly unethical manner.

The data science process I learned is similar to the one documented by Joe Blitzstein and Hanspeter Pfister at Harvard: (1) ask an interesting question, (2) get the data, (3) explore the data, (4) model the data, and (5) communicate and visualize the results. Going back to 1997, the similar process CRISP-DM, prominent in customer relationship management, includes the following steps: (1) business understanding, (2) data understanding, (3) data preparation, (4) modeling, (5) evaluation, and (6) deployment. In both these frameworks, getting the data is the second step and affects all the following steps. Having the wrong data at ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

This is Technology Ethics

This is Technology Ethics

Sven Nyholm, Steven D. Hales
Becoming a Data Head

Becoming a Data Head

Alex J. Gutman, Jordan Goldmeier
Data Quality Fundamentals

Data Quality Fundamentals

Barr Moses, Lior Gavish, Molly Vorwerck

Publisher Resources

ISBN: 9781492072652Errata Page