O'Reilly logo

Breaking Data Science Open by Christine Doig, Michele Chambers, Ian Stokes-Rees

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 1. How Data Science Entered Everyday Business

Business intelligence (BI) has been evolving for decades as data has become cheaper, easier to access, and easier to share. BI analysts take historical data, perform queries, and summarize findings in static reports that often include charts. The outputs of business intelligence are “known knowns” that are manifested in stand-alone reports examined by a single business analyst or shared among a few managers.

Predictive analytics has been unfolding on a parallel track to business intelligence. With predictive analytics, numerous tools allow analysts to gain insight into “known unknowns,” such as where their future competitors will come from. These tools track trends and make predictions, but are often limited to specialized programs designed for statisticians and mathematicians.

Data science is a multidisciplinary field that combines the latest innovations in advanced analytics, including machine learning and artificial intelligence, with high-performance computing and visualizations. The tools of data science originated in the scientific community, where researchers used them to test and verify hypotheses that include “unknown unknowns,” and they have entered business, government, and other organizations gradually over the past decade as computing costs have shrunk and software has grown in sophistication. The finance industry was an early adopter of data science. Now it is a mainstay of retail, city planning, political campaigns, and many other domains.

Data science is a significant breakthrough from traditional business intelligence and predictive analytics. It brings in data that is orders of magnitude larger than what previous generations of data warehouses could store, and it even works on streaming data sources. The analytical tools used in data science are also increasingly powerful, using artificial intelligence techniques to identify hidden patterns in data and pull new insights out of it. The visualization tools used in data science leverage modern web technologies to deliver interactive browser-based applications. Not only are these applications visually stunning, they also provide rich context and relevance to their consumers. Some of the changes driving the wider use of data science include:

The lure of Open Data Science

Open source communities want to break free from the shackles of proprietary tools and embrace a more open and collaborative work style that reflects the way they work with their teams all over the world. These communities are not just creating new tools; they’re calling on enterprises to use the right tools for the problem at hand. Increasingly, that’s a wide array of programming languages, analytic techniques, analytic libraries, visualizations, and computing infrastructure. Popular tools for Open Data Science include the R programming language, which provides a wide range of statistical functionality, and Python, which is a quick-to-learn, fast prototyping language that can easily be integrated with existing systems and deployed into production. Both of these languages have thousands of analytics libraries that deliver everything from basic statistics to linear algebra, machine learning, deep learning, image and natural language processing, simulation, and genetic algorithms used to address complexity and uncertainty. Additionally, powerful visualization libraries range from basic plotting to fully interactive browser-based visualizations that scale to billions of points.

The gains in productivity from data science collaboration

The very-sought-after unicorn data scientist who understands everything about algorithms, data collection, programming, and your business might exist, but more often it’s the modern, collaborating data science teams that get the job done for enterprises. Modern data science teams are a composite of the skills represented by the unicorn data scientist and work in multiple areas of a business. Their backgrounds cover a wide range of databases, statistics, programming, ETL (extract, transform, load), high-performance computing, Hadoop, machine learning, open source, subject matter expertise, business intelligence, and visualization. Data science collaboration tools facilitate workflows and interactions, typically based on an Agile methodology, so that work seamlessly flows between various team members. This highly interactive workflow helps teams progressively build and validate early-stage proof of concepts and prototypes, while moving toward production deployments.

The efficiencies of self-service data science

While predictive analytics was relegated to the back office and developed by mathematicians, data science has empowered entire data science teams, including frontliners—often referred to as citizen data scientists—with intelligent applications and ubiquitous tools that are familiar to businesspeople and use spreadsheet- and browser-based interfaces. With these powerful applications and tools, citizen data scientists can now perform their own predictive analyses to make evidence-based predictions and decisions.

The increasing ease of data science deployment

In the past, technology and cost barriers prevented predictive analytics from moving into production in many cases. Today, with Open Data Science, both of these barriers are significantly reduced, which has led to a rise in both producing new intelligent applications and intelligence embedded into devices and legacy applications.

What do the new data science capabilities mean for business users? Businesses are continually seeking competitive advantage, where there are a multitude of ways to use data and intelligence to underpin strategic, operational, and execution practices. Business users today, especially with millennials (comfortable with the open-ended capacities of Siri, Google Assistant, and Alexa) entering the workforce, expect an intelligent and personalized experience that can help them create value for their organization.

In short, data science drives innovation by arming everyone in an organization—from frontline employees to the board—with intelligence that connects the dots in data, bringing the power of new analytics to existing business applications and unleashing new intelligent applications. Data science can:

  • Uncover totally unanticipated relationships and changes in markets or other patterns
  • Help you change direction instantaneously
  • Constantly adapt to changing data
  • Handle streams of data—in fact, some embedded intelligent services make decisions and carry out those decisions automatically in microseconds

Data science enriches the value of data, going beyond what the data says to what it means for your organization—in other words, it turns raw data into intelligence that empowers everyone in your organization to discover new innovations, increase sales, and become more cost-efficient. Data science is not just about the algorithm, but about deriving value.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required