Video description
Patrick McFadin explains the basics of how to build more efficient data pipelines, using Apache Kafka to organize, Apache Cassandra to store, and Apache Spark to analyze. Patrick offers an overview of how Cassandra works and why it can be a perfect fit for data-driven projects. Patrick then demonstrates that with the addition of Spark and Kafka, you can maintain a highly distributed, fault-tolerant, and scaling solution. You’ll leave with a comprehensive view of the many options to make considered choices in your data pipeline projects.
Table of contents
Product information
- Title: Building Better Distributed Data Pipelines
- Author(s):
- Release date: November 2017
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781492030997
You might also like
video
Model Monitoring Pipelines
Presented by Boryana Manz – Manager, Data Science at Capital One Model monitoring can make or …
video
Software Architecture - Impacts of Big Data
Tim Berglund will discuss how the commitment to distributed computing affects software architecture. We'll explore what …
video
Berglund and McCullough on Mastering Cassandra for Architects
This in-depth video training session introduces you to the Cassandra NoSQL database, from drivers to deployment. …
video
Deploying TensorFlow Models to a Web Application: Using Flask API, TensorFlowJS, and TensorFlow Serving
Implement machine learning to realize the power of AI algorithms. Developers and companies often struggle to …