O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data Stream Development with Apache Spark, Kafka, and Spring Boot

Video Description

Handle high volumes of data at high speed. Architect and implement an end-to-end data streaming pipeline

About This Video

  • From blueprint architecture to complete code solution, this course treats every important aspect involved in architecting and developing a data streaming pipeline
  • Select the right tools and frameworks and follow the best approaches to designing your data streaming framework
  • Build an end-to-end data streaming pipeline from a real data stream (Meetup RSVPs) and expose the analyzed data in browsers via Google Maps

In Detail

Today, organizations have a difficult time working with huge numbers of datasets. In addition, data processing and analyzing need to be done in real time to gain insights. This is where data streaming comes in. As big data is no longer a niche topic, having the skillset to architect and develop robust data streaming pipelines is a must for all developers. In addition, they also need to think of the entire pipeline, including the trade-offs for every tier.

This course starts by explaining the blueprint architecture for developing a completely functional data streaming pipeline and installing the technologies used. With the help of live coding sessions, you will get hands-on with architecting every tier of the pipeline. You will also handle specific issues encountered working with streaming data. You will input a live data stream of Meetup RSVPs that will be analyzed and displayed via Google Maps.

By the end of the course, you will have built an efficient data streaming pipeline and will be able to analyze its various tiers, ensuring a continuous flow of data.

All the code and supporting files for this course are available at https://github.com/PacktPublishing/-Data-Stream-Development-with-Apache-Spark-Kafka-and-Spring-Boot

Table of Contents

  1. Chapter 1 : Introducing Data Streaming Architecture
    1. The Course Overview 00:06:22
    2. Discovering the Data Streaming Pipeline Blueprint Architecture 00:17:37
    3. Analyzing Meetup RSVPs in Real-Time 00:05:59
  2. Chapter 2 : Deployment of Collection and Message Queuing Tiers
    1. Running the Collection Tier (Part I – Collecting Data) 00:20:40
    2. Collecting Data Via the Stream Pattern and Spring WebSocketClient API 00:06:51
    3. Explaining the Message Queuing Tier Role 00:06:19
    4. Introducing Our Message Queuing Tier –Apache Kafka 00:24:58
    5. Running The Collection Tier (Part II – Sending Data) 00:14:15
  3. Chapter 3 : Proceeding to the Data Access Tier
    1. Dissecting the Data Access Tier 00:18:18
    2. Introducing Our Data Access Tier – MongoDB 00:11:13
    3. Exploring Spring Reactive 00:24:48
    4. Exposing the Data Access Tier in Browser 00:09:46
  4. Chapter 4 : Implementing the Analysis Tier
    1. Diving into the Analysis Tier 00:19:09
    2. Streaming Algorithms For Data Analysis 00:29:14
    3. Introducing Our Analysis Tier – Apache Spark 00:18:19
    4. Plug-in Spark Analysis Tier to Our Pipeline 00:09:48
    5. Brief Overview of Spark RDDs 00:25:07
    6. Spark Streaming 00:28:37
    7. DataFrames, Datasets and Spark SQL 00:22:14
    8. Spark Structured Streaming 00:32:37
    9. Machine Learning in 7 Steps 00:20:51
    10. MLlib (Spark ML) 00:25:18
    11. Spark ML and Structured Streaming 00:23:46
    12. Spark GraphX 00:06:41
  5. Chapter 5 : Mitigate Data Loss between Collection, Analysis and Message Queuing Tiers
    1. Fault Tolerance (HML) 00:27:55
    2. Kafka Connect 00:04:19
    3. Securing Communication between Tiers 00:10:19