Chapter 47. The Importance of Building Knowledge in Democratized Data Science Realms

Justin Cochran

It is well known that data science tools are becoming more “democratized,” or distributed more broadly within organizations to roles that not too long ago had to request analysis rather than perform it themselves. These tools are getting more sophisticated in terms of particular analysis techniques, the ability to connect to data sources of many kinds, and the ability to share data with people inside the organization and beyond. A primary reason that data science tools can be democratized broadly in organizations, even though they are becoming more powerful and sophisticated, is that the developers of the tools are able to successfully hide their complexity from the end users (until a user needs to peel back the layers for specific reasons).

The combination of hidden complexity and sophisticated analyses introduces some risks when the data analysis is driving decision making. In a much-too-simple analogy, we trust users to utilize calculators because they are familiar with and knowledgeable about basic arithmetic. What happens when the analysis capability is available at the click of a button but the end user does not understand the “arithmetic”? It can potentially open the door to analyses using the wrong techniques, ...

Get 97 Things About Ethics Everyone in Data Science Should Know now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.