Patrick McFadin explains the basics of how to build more efficient data pipelines, using Apache Kafka to organize, Apache Cassandra to store, and Apache Spark to analyze. Patrick offers an overview of how Cassandra works and why it can be a perfect fit for data-driven projects. Patrick then demonstrates that with the addition of Spark and Kafka, you can maintain a highly distributed, fault-tolerant, and scaling solution. You’ll leave with a comprehensive view of the many options to make considered choices in your data pipeline projects.
Table of contents
- Title: Building Better Distributed Data Pipelines
- Release date: November 2017
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781492030997
You might also like
Distributed Systems in One Lesson
Simple tasks like running a program or storing and retrieving data become much more complicated when …
Software Engineering at Google
Today, software engineers need to know not only how to program effectively but also how to …
Software Architecture Fundamentals, Second Edition
Being a successful software architect is more than just possessing technical knowledge. It’s about thinking like …
Designing Data-Intensive Applications
Data is at the center of many challenges in system design today. Difficult issues need to …