Skip to Content
Effective Data Science Infrastructure
book

Effective Data Science Infrastructure

by Ville Tuulos
August 2022
Intermediate to advanced
352 pages
11h 36m
English
Manning Publications

Overview

Simplify data science infrastructure to give data scientists an efficient path from prototype to production.

In Effective Data Science Infrastructure you will learn how to:

  • Design data science infrastructure that boosts productivity
  • Handle compute and orchestration in the cloud
  • Deploy machine learning to production
  • Monitor and manage performance and results
  • Combine cloud-based tools into a cohesive data science environment
  • Develop reproducible data science projects using Metaflow, Conda, and Docker
  • Architect complex applications for multiple teams and large datasets
  • Customize and grow data science infrastructure

Effective Data Science Infrastructure: How to make data scientists more productive is a hands-on guide to assembling infrastructure for data science and machine learning applications. It reveals the processes used at Netflix and other data-driven companies to manage their cutting edge data infrastructure. In it, you’ll master scalable techniques for data storage, computation, experiment tracking, and orchestration that are relevant to companies of all shapes and sizes. You’ll learn how you can make data scientists more productive with your existing cloud infrastructure, a stack of open source software, and idiomatic Python.

The author is donating proceeds from this book to charities that support women and underrepresented groups in data science.

About the Technology
Growing data science projects from prototype to production requires reliable infrastructure. Using the powerful new techniques and tooling in this book, you can stand up an infrastructure stack that will scale with any organization, from startups to the largest enterprises.

About the Book
Effective Data Science Infrastructure teaches you to build data pipelines and project workflows that will supercharge data scientists and their projects. Based on state-of-the-art tools and concepts that power data operations of Netflix, this book introduces a customizable cloud-based approach to model development and MLOps that you can easily adapt to your company’s specific needs. As you roll out these practical processes, your teams will produce better and faster results when applying data science and machine learning to a wide array of business problems.

What's Inside
  • Handle compute and orchestration in the cloud
  • Combine cloud-based tools into a cohesive data science environment
  • Develop reproducible data science projects using Metaflow, AWS, and the Python data ecosystem
  • Architect complex applications that require large datasets and models, and a team of data scientists


About the Reader
For infrastructure engineers and engineering-minded data scientists who are familiar with Python.

About the Author
At Netflix, Ville Tuulos designed and built Metaflow, a full-stack framework for data science. Currently, he is the CEO of a startup focusing on data science infrastructure.

Quotes
By reading and referring to this book, I’m confident you will learn how to make your machine learning operations much more efficient and productive.
- From the Foreword by Travis Oliphant, Author of NumPy, Founder of Anaconda, PyData, and NumFOCUS

Effective Data Science Infrastructure is a brilliant book. It’s a must-have for every data science team.
- Ninoslav Cerkez, Logit

More data science. Less headaches.
- Dr. Abel Alejandro Coronado Iruegas, National Institute of Statistics and Geography of Mexico

Indispensable. A copy should be on every data engineer’s bookshelf.
- Matthew Copple, Grand River Analytics

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Data Science on AWS

Data Science on AWS

Chris Fregly, Antje Barth
Reliable Machine Learning

Reliable Machine Learning

Cathy Chen, Niall Richard Murphy, Kranti Parisa, D. Sculley, Todd Underwood
Machine Learning for High-Risk Applications

Machine Learning for High-Risk Applications

Patrick Hall, James Curtis, Parul Pandey

Publisher Resources

ISBN: 9781617299193Publisher SupportPublisher WebsiteErrata Page