How to Get a Competitive Advantage Using Data Science
The Standard Story Line for Getting Value from Data Science
Data science already plays a significant role in specialized areas. Being able to predict machine failure is a big deal in transportation and manufacturing. Predicting user engagement is huge in advertising. And properly classifying potential voters can mean the difference between winning and losing an election.
But the thing that excites me most is the promise that, in general, data science can give a competitive advantage to almost any business that is able to secure the right data and the right talent. I believe that data science can live up to this promise, but only if we can fix some common misconceptions about its value.
For instance, here's the standard story line when it comes to data science: data-driven companies outperform their peers—just look at Google, Netflix, and Amazon. You need high-quality data with the right velocity, variety, and volume, the story goes, as well as skilled data scientists who can find hidden patterns and tell compelling stories about what those patterns really mean. The resulting insights will drive businesses to optimal performance and greater competitive advantage. Right?
The standard story line sounds really good. But a few problems occur when you try to put it into practice.
The first problem, I think, is that the story makes the wrong assumption about what to look for in a data scientist. If you do a web search on the skills required to be a data scientist (seriously, try it), you'll find a heavy focus on algorithms. It seems that we tend to assume that data science is mostly about creating and running advanced analytics algorithms.
I think the second problem is that the story ignores the subtle, yet very persistent tendency of human beings to reject things we don't like. Often we assume that getting someone to accept an insight from a pattern found in the data is a matter of telling a good story. It's the "last mile" assumption. Many times what happens instead is that the requester questions the assumptions, the data, the methods, or the interpretation. You end up chasing follow-up research tasks until you either tell your requesters what they already believed or just give up and find a new project.
An Alternative Story Line for Getting Value from Data Science
The first step in building a competitive advantage through data science is having a good definition of what a data scientist really is. I believe that data scientists are, foremost, scientists. They use the scientific method. They guess at hypotheses. They gather evidence. They draw conclusions. Like all other scientists, their job is to create and test hypotheses. Instead of specializing in a particular domain of the world, such as living organisms or volcanoes, data scientists specialize in the study of data. This means that, ultimately, data scientists must have a falsifiable hypothesis to do their job. Which puts them on a much different trajectory than what is described in the standard story line.
If you want to build a competitive advantage through data science, you need a falsifiable hypothesis about what will create that advantage. Guess at the hypothesis, then turn the data scientist loose on trying to confirm or refute it. There are countless specific hypotheses you can explore, but they will all have the same general form:
It's more effective to do X than to do Y
- Our company will sell more widgets if we increase delivery capabilities in Asia Pacific.
- The sales force will increase their overall sales if we introduce mandatory training.
- We will increase customer satisfaction if we hire more user-experience designers.
You have to describe what you mean by effective. That is, you need some kind of key performance indicator, like sales or customer satisfaction, that defines your desired outcome. You have to specify some action that you believe connects to the outcome you care about. You need a potential leading indicator that you've tracked over time. Assembling this data is a very difficult step, and one of the main reasons you hire a data scientist. The specifics will vary, but the data you need will have the same general form shown in Figure 2-1.
Let's take, for example, our hypothesis that hiring more user-experience designers will increase customer satisfaction. We already control whom we hire. We want greater control over customer satisfaction—the key performance indicator. We assume that the number of user experience designers is a leading indicator of customer satisfaction. User experience design is a skill of our employees, employees work on client projects, and their performance influences customer satisfaction.
Once you've assembled the data you need (Figure 2-2), let your data scientists go nuts. Run algorithms, collect evidence, and decide on the credibility of the hypothesis. The end result will be something along the lines of "yes, hiring more user experience designers should increase customer satisfaction by 10% on average" or "the number of user experience designers has no detectable influence on customer satisfaction."
The Importance of the Scientific Method
Notice, now, that we've pushed well past the "last mile." At this point, progress is not a matter of telling a compelling story and convincing someone of a particular worldview. Progress is a matter of choosing whether or not the evidence is strong enough to justify taking action. The whole process is simply a business adaptation of the scientific method (Figure 2-3).
This brand of data science may not be as exciting as the idea of taking unexplored data and discovering unexpected connections that change everything. But it works. The progress you make is steady and depends entirely on the hypotheses you choose to investigate.
Which brings us to the main point: there are many factors that contribute to the success of a data science team. But achieving a competitive advantage from the work of your data scientists depends on the quality and format of the questions you ask.