April 2018
Beginner
238 pages
7h 13m
English
We can use this script to see the word counts for a file:
import pysparkif not 'sc' in globals(): sc = pyspark.SparkContext()text_file = sc.textFile("B09656_09_word_count.ipynb")counts = text_file.flatMap(lambda line: line.split(" ")) \ .map(lambda word: (word, 1)) \ .reduceByKey(lambda a, b: a + b)for x in counts.collect(): print(x)
When we run this in Jupyter, we see something akin to this display:

The display continues for every individual word that was detected in the source file.