O'Reilly logo

Spark for Python Developers by Amit Nandi

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Building our first app with PySpark

We are ready to check now that everything is working fine. The obligatory word count will be put to the test in processing a word count on the first chapter of this book.

The code we will be running is listed here:

# Word count on 1st Chapter of the Book using PySpark # import regex module import re # import add from operator module from operator import add # read input file file_in = sc.textFile('/home/an/Documents/A00_Documents/Spark4Py 20150315') # count lines print('number of lines in file: %s' % file_in.count()) # add up lengths of each line chars = file_in.map(lambda s: len(s)).reduce(add) print('number of characters in file: %s' % chars) # Get words from the input file words =file_in.flatMap(lambda line: ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required