May 2018
Beginner to intermediate
482 pages
11h 42m
English
Before we go into detail about Apache Flink, let's review, at a higher level, the types of datasets that you're likely to encounter when processing data, as well as the types of execution models you can choose for processing. These two ideas are often conflated; it will be useful to know what makes them different.
Firstly, there are two types of datasets:
Many real-world datasets that are traditionally thought of as bounded or batch are, in reality, unbounded datasets. This is true whether the data is stored in a sequence of directories on HDFS, or in a log-based system, such as Apache Kafka.
Some ...
Read now
Unlock full access