Chapter 8. The Road Ahead
Data Science Today
Kaggle is a marketplace for hosting data science competitions. Companies post their questions and data scientists from all over the world compete to produce the best answers. When a company posts a challenge, it also posts how much it’s willing to pay to anyone who can find an acceptable answer. If you take the questions posted to Kaggle and plot them by value in descending order, the graph looks like Figure 8-1.
Figure 8-1. The value of questions posted to Kaggle matches a long-tail distribution
This is a classic long-tail distribution. Half the value of the Kaggle market is concentrated in about 6% of the questions, while the other half is spread out among the remaining 94%. This distribution gets skewed even more if you consider all the questions with no direct monetary value—questions that offer incentives like jobs or kudos.
I strongly suspect that the wider data science market has the same long-tail shape. If I could get every company to declare every question that could be answered using data science, and what they would offer to have those questions answered, I believe that the concentration of value would look very similar to that of the Kaggle market.
Today, the prevailing wisdom for making money in data science is to go after the head of the market using centralized capabilities. Companies collect expensive resources (like ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access