6Citizen Data Science

Citizen data science as practiced in organizations really consists of two categories of activity. One might be more accurately called citizen data analysis because there isn’t much science to it. Instead, it involves straightforward data analysis, generally with only descriptive statistics, visual analytics, or perhaps a bit of ordinary regression analysis. Few people seem to object to this idea, as it doesn’t require a high degree of quantitative expertise, and creating a dashboard full of bar charts is unlikely to lead to really bad decisions.

The other category might be called real or true citizen data science, because it involves complex data analysis and the use of sophisticated predictive models. This activity has both proponents and conscientious objectors. It requires statistical expertise, and its outputs can be embedded into decision processes that, if made badly, can lead to a lot of trouble.

Both types usually involve some degree of data wrangling, although that within true citizen data science is typically more technically challenging. Some people worry about that wrangling because it sometimes leads to the creation of new data and “multiple versions of the truth.” We’ll describe the positive and negative attributions of both citizen data analysis and citizen data science in this chapter, beginning with citizen data analysis.

Citizen Data Analysis

For well over a decade the relatively simple analysis of data has increasingly been performed ...

Get All Hands on Tech now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.