6

Architecting a Real-Time Processing Pipeline

In the previous chapter, we learned how to architect a big data solution for a high-volume batch-based data engineering problem. Then, we learned how big data can be profiled using Glue DataBrew. Finally, we learned how to logically choose between various technologies to build a Spark-based complete big data solution in the cloud.

In this chapter, we will discuss how to analyze, design, and implement a real-time data analytics solution to solve a business problem. We will learn how the reliability and speed of processing can be achieved with the help of distributed messaging systems such as Apache Kafka to stream and process the data. Here, we will discuss how to write a Kafka Streams application ...

Get Scalable Data Architecture with Java now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.