Building our first app with PySpark

We are ready to check now that everything is working fine. The obligatory word count will be put to the test in processing a word count on the first chapter of this book.

The code we will be running is listed here:

# Word count on 1st Chapter of the Book using PySpark # import regex module import re # import add from operator module from operator import add # read input file file_in = sc.textFile('/home/an/Documents/A00_Documents/Spark4Py 20150315') # count lines print('number of lines in file: %s' % file_in.count()) # add up lengths of each line chars = s: len(s)).reduce(add) print('number of characters in file: %s' % chars) # Get words from the input file words =file_in.flatMap(lambda line: ...

