Transformations

Transformations change a DataSet into a new DataSet by applying the transformation logic to each row of the original DataSet. As an example, if we want to eliminate the first header row from the input then we can use a filter() operation to do this.

Following is application of two filter() operations to first remove the header and then making sure we have the correct number of columns in each row which happens to be 8 in this case:

val dataSet = benv.readTextFile("OnlineRetail.csv")    .filter(!_.startsWith("InvoiceNo"))    .filter(_.split(",").length == 8)dataSet.map(x => x.split(",")(2))    .first(10).print()

This will print the contents of the DataSet once loaded as shown in the following code:

 WHITE HANGING HEART T-LIGHT HOLDER ...

Get Big Data Analytics with Hadoop 3 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.