Apache Kudu, the breakthrough storage technology, is often used in conjunction with other Hadoop ecosystem frameworks for data ingest, processing, and analysis. This is a practical, hands-on course that shows you how Kudu works with four of those frameworks: Apache Spark, Spark SQL, MLlib, and Apache Flume.
You'll use the Kudu-Spark module with Spark and SparkSQL to seamlessly create, move, and update data between Kudu and Spark; then use Apache Flume to stream events into a Kudu table, and finally, query it using Apache Impala. The course is designed for learners with some limited experience using Hadoop ecosystem components like HDFS, Hive, Spark, or Impala.
- Get hands-on experience with Kudu and add more tools to your Big Data toolbox
- Learn how to move data between Kudu tables and Spark apps using the Kudu-Spark module
- Understand how to stream and analyze data in real-time with Flume and Kudu
- Create a movie ratings predictor using Flume and save the predicted values into Kudu
- See how these open source tools combine to create simple and fast data engineering pipelines
- Title: Using Kudu with Apache Spark and Apache Flume
- Release date: March 2017
- Publisher(s): Infinite Skills
- ISBN: 9781491985724