Suppose you have written an application in Scala that can tell you how many words are there in a document or text file as follows:
package com.chapter16.SparkTestingimport org.apache.spark._import org.apache.spark.sql.SparkSessionclass wordCounterTestDemo { val spark = SparkSession .builder .master("local[*]") .config("spark.sql.warehouse.dir", "E:/Exp/") .appName(s"OneVsRestExample") .getOrCreate() def myWordCounter(fileName: String): Long = { val input = spark.sparkContext.textFile(fileName) val counts = input.flatMap(_.split(" ")).distinct() val counter = counts.count() counter }}
The preceding code simply parses a text file and performs a flatMap operation by simply splitting the words. Then, it performs ...