O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Breaking Data Science Open

Book Description

Over the past decade, data science has come out of the back office to become a force of change across the entire organization. At the forefront of this change is the open data science movement that advocates the use of open source tools in a powerful, connected ecosystem. This report explores how open data science can help your organization break free from the shackles of proprietary tools, embrace a more open and collaborative work style, and unleash new intelligent applications quickly.

Authors Michele Chambers and Christine Doig explain how open source tools have helped bring about many facets of the data science evolution, including collaboration, self-service, and deployment. But you’ll discover that open data science is about more than tools; it’s about a new way of working as an organization.

  • Learn how data science—particularly open data science—has become part of everyday business
  • Understand how open data science engages people from other disciplines, not just statisticians
  • Examine tools and practices that enable data science to be open across technical, operational, and organizational aspects
  • Learn benefits of open data science, including rich resources, agility, transparency, and collective intelligence
  • Explore case studies that demonstrate different ways to implement open data science
  • Discover how open data science can help you break down department barriers and make bold market moves

Michele Chambers, Chief Marketing Officer and VP Products at Continuum Analytics, is an entrepreneurial executive with over 25 years of industry experience. Prior to Continuum Analytics, Michele held executive leadership roles at several database and analytic companies, including Netezza, IBM, Revolution Analytics, MemSQL, and RapidMiner.

Christine Doig is a senior data scientist at Continuum Analytics, where she's worked on several projects, including MEMEX, a DARPA-funded open data science project to help stop human trafficking. She has 5+ years of experience in analytics, operations research, and machine learning in a variety of industries.

Table of Contents

  1. Preface
  2. 1. How Data Science Entered Everyday Business
  3. 2. Modern Data Science Teams
  4. 3. Data Science for All
    1. Open Source Software and Benefits of Open Data Science
    2. The Future of the Open Data Science Stack
  5. 4. Open Data Science Applications: Case Studies
    1. Recursion Pharmaceuticals
    2. TaxBrain
    3. Lawrence Berkeley National Laboratory/University of Hamburg
  6. 5. Data Science Executive Sponsorship
    1. Dynamic, Not Static, Investments
      1. Data Lab Environment
      2. Team Management, Processes, and Protocols
      3. Data Services and Data Lifecycle Management
      4. Infrastructure and Infrastructure Operations
    2. Executive Sponsorship Responsibilities
      1. Governance
      2. Provenance
      3. Reproducibility
  7. 6. The Journey to Open Data Science
    1. Team
    2. Technology
    3. Migration
  8. 7. The Open Data Science Landscape
    1. What the Open Data Science Community Can Do for You
    2. The Power of Open Data Science Languages
      1. Why R?
      2. Why Python?
      3. Why Scala?
      4. Why Julia?
    3. Established Open Data Science Technologies
      1. Notebooks and Narratives
      2. Hadoop/Spark
      3. Anaconda
      4. H2O
      5. pandas
      6. Scikit-Learn
      7. Caret
      8. Shiny
      9. Dask
    4. Emerging Open Data Science Technologies: Encapsulation with Docker and conda
    5. Open Source on the Rise
  9. 8. Data Science in the Enterprise
    1. How to Bring Open Data Science to the Enterprise
      1. Governance
      2. Collaboration
      3. Operations
      4. Big Data
      5. Machine Learning and Artificial Intelligence
      6. Interactive Dashboards and Apps
      7. Self-Service Analytics
  10. 9. Data Science Collaboration
    1. How Collaborative, Cross-Functional Teams Get Their Work Done
    2. Data Science Is a Team Sport
    3. Collaborating Across Multiple Projects
    4. Collaboration Is Essential for a Winning Data Science Team
  11. 10. Self-Service Data Science
    1. Self-Service Data Science
      1. Meet Me Where I Am
      2. Make It Dead Simple
      3. Make It Intuitively Obvious
    2. Self-Service Is the Answer—But the Right Self-Service Is Needed
  12. 11. Data Science Deployment
    1. What Data Scientists and Developers Bring to the Deployment Process
    2. The Traditional Way to Deploy
    3. Successfully Deploying Open Data Science
      1. Assets to Deploy
      2. Processes to Deploy
    4. Open Data Science Deployment: Not Your Daddy’s DevOps
  13. 12. The Data Science Lifecycle
    1. Models As Living, Breathing Entities
    2. The Data Science Lifecycle
    3. Benefits of Managing the Data Science Lifecycle
    4. Data Science Asset Governance
    5. Model Lifecycle Management
      1. The Champion-Challenger Model
      2. Getting Over Hurdle Rates
    6. Other Data Science Model Evaluation Rates
    7. Keeping Your Models Relevant