August 2018
Beginner
282 pages
5h 51m
English
Our first script reads in a text file and sees how much the line lengths add up to, as shown next. Note that we are reading in the Notebook file we are running; the Notebook is named Spark File Lengths, and is stored in the Spark File Lengths.ipynb file:
import pysparkif not 'sc' in globals(): sc = pyspark.SparkContext()lines = sc.textFile("Spark File Line Lengths.ipynb")lineLengths = lines.map(lambda s: len(s))totalLengths = lineLengths.reduce(lambda a, b: a + b)print(totalLengths)
In the print(totalLengths) script, we first initialize Spark, but only if we have not done so already. Spark will complain if you try to initialize it more than once, so all Spark scripts should have this if statement prefix.
The script reads ...
Read now
Unlock full access