Chapter 7: Extending Apache Beam's I/O Connectors

In previous chapters, we focused on how to write data transformations after reading the data from data sources. There are two types of sources: bounded and unbounded. The difference between these is obvious – the size of the bounded type is limited (and this limitation is known in advance), while the size of the unbounded type is (possibly) infinite. A classic example of a bounded source is a file (or a set of immutable files), while an unbounded source is typically a streaming source such as Apache Kafka. Note that we can always convert an unbounded source to a bounded one by defining a bounding constraint. This could be, for example, the number of records that we want to read or the (processing ...

Get Building Big Data Pipelines with Apache Beam now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.