Chapter 5: Using SQL for Pipeline Implementation
In the previous chapter, we explored how to view a stream as a changing table and vice versa. We also recalled that a table has a fancy name – a relation – and that a table that changes over time is called a Time-Varying Relation (TVR). In this chapter, we will use this knowledge to make our lives easier when implementing real-life problems. Instead of writing a full-blown pipeline in the Java SDK – which can sometimes be a little lengthy – we will use a well-known language to express our data transforms. As the name of this chapter suggests, this language will be Structured Query Language (SQL). The language itself needs some extensions to be able to manipulate the TVRs since the original version ...
Get Building Big Data Pipelines with Apache Beam now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.