Skip to Main Content
Big Data Now: 2015 Edition
book

Big Data Now: 2015 Edition

by O'Reilly Media, Inc.
January 2016
Beginner to intermediate content levelBeginner to intermediate
162 pages
3h 50m
English
O'Reilly Media, Inc.
Content preview from Big Data Now: 2015 Edition

Chapter 3. Data Pipelines

Engineering and optimizing data pipelines continues to be an area of particular interest, as researchers attempt to improve efficiency so they can scale to very large data sets. Workflow tools that enable users to build pipelines have also become more common—these days, such tools exist for data engineers, data scientists, and even business analystsIn this chapter, we present a collection of blog posts and podcasts that cover the latest thinking in the realm of data pipelines.

First, Ben Lorica explains why interactions between parts of a pipeline are an area of active research, and why we need tools to enable users to build certifiable machine learning pipelines. Michael Li then explores three best practices for building successful pipelines—reproducibility, consistency, and productionizability. Next, Kiyoto Tamura explores the ideal frameworks for collecting, parsing, and archiving logs, and also outlines the value of JSON as a unifying format. Finally, Gwen Shapira discusses how to simplify backend A/B testing using Kafka.

Building and Deploying Large-Scale Machine Learning Pipelines

There are many algorithms with implementations that scale to large data sets (this list includes matrix factorization, SVM, logistic regression, LASSO, and many others). In fact, machine learning experts are fond of pointing out: If you can pose your problem as a simple optimization problem then you’re ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Big Data Now: 2014 Edition

Big Data Now: 2014 Edition

O'Reilly Media, Inc.

Publisher Resources

ISBN: 9781492042273Publisher Website