September 2017
Beginner to intermediate
360 pages
8h 13m
English
DataSet API is used for batch processing. It has almost the same type of transformations as DataStream API provides. The following code snippet is a small example of word count using DataSet API:
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();DataSet<String> text = env.fromElements("Who's there?","I think I hear them. Stand, ho! Who's there?");DataSet<Tuple2<String, Integer>> wordCounts = text.flatMap(new LineSplitter()).groupBy(0).sum(1);
Here the execution environment is different compared to DataStream, that is ExecutionEnviornment. The previous program is doing the same task, but using a different method on bounded data.
The following are the transformations available with DataSet API and they ...
Read now
Unlock full access