July 2018
Intermediate to advanced
506 pages
16h 2m
English
In the previous example, the pipeline operated on a single input file from Cloud Storage. Because this is a bounded input, the pipeline executed as a batch job. We can alternatively configure the pipeline to pull messages from a Cloud Pub/Sub topic, which is an unbounded dataset and hence results in a streaming job.
In many cases, inferences need to be made against sets of data with a clear beginning and ending. For bounded datasets, the beginning and ending occur naturally as the boundaries for the dataset. However, streaming datasets lack such clearly defined beginnings and endings. In order to address this issue, many stream processing tools introduce the concept of windowing, or simply imposing a start and ...