Patrick McFadin explains the basics of how to build more efficient data pipelines, using Apache Kafka to organize, Apache Cassandra to store, and Apache Spark to analyze. Patrick offers an overview of how Cassandra works and why it can be a perfect fit for data-driven projects. Patrick then demonstrates that with the addition of Spark and Kafka, you can maintain a highly distributed, fault-tolerant, and scaling solution. You’ll leave with a comprehensive view of the many options to make considered choices in your data pipeline projects.
Table of contents
- Title: Building Better Distributed Data Pipelines
- Release date: November 2017
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781492030997
You might also like
Building Machine Learning Pipelines
Companies are spending billions on machine learning projects, but it’s money wasted if the models can’t …
Architecting Modern Data Platforms
There’s a lot of information about big data technologies, but splicing these technologies into an end-to-end …
The Enterprise Big Data Lake
The data lake is a daring new approach for harnessing the power of big data technology …
Learning Spark, 2nd Edition
Data is bigger, arrives faster, and comes in a variety of formats—and it all needs to …