Step 3 - Streaming data into HDFS

Flink also provides a number of connectors including HDFS connectors as sinks. All the HDFS connectors have very similar constructs. HDFS connectors can sink messages from Flink DataStreams that have a tuple structure. HDFS also stores data as tuples. The specific class provided for this purpose by Flink is the Tuple2 class.

A tuple is a finite ordered list of elements. (https://en.wikipedia.org/wiki/Tuple).

Any sink can be added to the Flink environment by making a call to the env.addSink(...) method. The specific class that we have used here is BucketingSink. The following code can be considered as a reference for understanding our example:

System.setProperty("HADOOP_USER_NAME", flinkProps.getProperty( ...

Get Data Lake for Enterprises now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.