O'Reilly logo

Breaking Data Science Open by Christine Doig, Michele Chambers, Ian Stokes-Rees

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Preface

Data science has captured the public’s attention over the past few years as perhaps the hottest and most lucrative technology field. No longer just a buzzword for advanced analytical software, data science is poised to change everything about an organization: its potential customers, its expansion plans, its engineering and manufacturing process, how it chooses and interacts with suppliers, and more. The leading edge of this tsunami is a combination of innovative business and technology trends that promise a more intelligent future based on the pairing of open source software and cross-organizational collaboration called Open Data Science. Open Data Science is a movement that makes the open source tools of data science—data, analytics, and computation—work together as a connected ecosystem.

Open Data Science, as we’ll explore in this report, is the combination—greater than the sum of its parts—of developments in software, hardware, and organizational culture. The ongoing consumerization of technology has brought open source to the forefront, creating a marketplace of ideas where innovation quickly emerges and is vetted by millions of demanding users worldwide. These users industrialize products faster than any commercial technology company could possibly accomplish. On top of this, the Agile trend fosters rapid experimentation and prototyping, which prompts modern data science teams to constantly generate and test new hypotheses, discarding many ideas and quickly arriving at the top 1 percent that can generate value and are worth pursuing. Agile has also led to the fusing of development and operations into DevOps, where the top ideas are quickly pushed into production deployment to reap value. All this lies against a background of ever-growing data sources and data speeds (“Big Data”). This continuous cycle of innovation requires that modern data science teams utilize an evolving set of open source innovations to add higher levels of value without recreating the wheel.

This report discusses the evolution of data science and the technologies behind Open Data Science, including data science collaboration, self-service data science, and data science deployment. Because Open Data Science is composed of these many moving pieces, we’ll discuss strategies and tools for making the technologies and people work together to realize their full potential.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required