April 2018
Beginner
238 pages
7h 13m
English
We can slightly modify the previous script to produce a sorted listed as follows:
import pysparkif not 'sc' in globals(): sc = pyspark.SparkContext() text_file = sc.textFile("B09656_09_word_count.ipynb")sorted_counts = text_file.flatMap(lambda line: line.split(" ")) \ .map(lambda word: (word, 1)) \ .reduceByKey(lambda a, b: a + b) \ .sortByKey()for x in sorted_counts.collect(): print(x)
Producing the output as follows:

The list continues for every word found. Notice the descending order of occurrences and the sorting with words of the same occurrence. What Spark uses to determine word breaks does not appear to be too good. ...