Chapter 11. Caution: Data Science Projects Can Turn into the Emperor’s New Clothes

Shweta Katre

The fourth industrial revolution has dawned: the Age of Analytics. There is a mad rush to develop predictive models and algorithms to establish supremacy in an immensely competitive, data-driven world. Starting off with predictive analytics and moving into the realm of machine learning and artificial intelligence, most companies are expanding their capabilities to spawn data science projects.

Enormous pressure is placed on data science teams, like all other kinds of project teams, to deliver business value that is usable and potentially releasable in a specific time frame. A big challenge for data science teams is to show visible and measurable progress/work done, to keep stakeholder interest alive and the funding steady.

However, roughly 80% of project time is spent on data collection/selection, data preparation, and exploratory analysis (see the following figure). Project overheads are huge, but there is no visible output. The promised predictive model or algorithm is not revealed in the early or even middle stages. Sometimes, in the evaluation or validation stage, the possibility arises of scrapping the entire analysis and going back to the drawing board. In such scenarios, resources have been used up, hours have been burned, but no output has resulted—à la the emperor’s new clothes! ...

Get 97 Things Every Data Engineer Should Know now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.