Chapter 5. The Serving Layer: Apache Pinot
AATD has come to the conclusion that it’s going to need to introduce a new piece of infrastructure to achieve scalable real-time analytics, but isn’t yet convinced that a full-blown OLAP database is necessary.
In this chapter, we’ll start by explaining why we can’t just use a stream processor to serve queries on streams, before introducing Apache Pinot, one of the new breed of OLAP databases designed for real-time analytics.
We’ll learn about Pinot’s architecture and data model, before ingesting the orders stream.
After that, we’ll learn about timestamp indexes and how to write queries against Pinot using SQL.
Figure 5-1 shows how we’re going to evolve our infrastructure in this chapter.
Figure 5-1. Evolution of the orders service
Why Can’t We Use Another Stream Processor?
At the end of the last chapter, we described some of the limitations of using Kafka Streams to serve queries on top of streams. (See “Limitations of Kafka Streams”.) These were by no means a criticism of Kafka Streams as a technology; it’s just that we weren’t really using it for the types of problems for which it was designed.
A reasonable question might be, Why can’t we use another stream processor instead, such as ksqlDB or Flink? Both of these tools offer SQL interfaces, solving the issue of having to write Java code to query streams.
Unfortunately, it still doesn’t ...