Skip to Content
Jupyter Cookbook
book

Jupyter Cookbook

by Toomey, Nikhil Borkar, Nikhil Akki, Juan Tomás Oliva Ramos
April 2018
Beginner content levelBeginner
238 pages
7h 13m
English
Packt Publishing
Content preview from Jupyter Cookbook

Running a Spark script

We can run a small Spark script to read in a file and sum up the line lengths. We are using lambda functions to map/reduce the sizes in a Hadoop fashion:

import pysparkif not 'sc' in globals():    sc = pyspark.SparkContext()lines = sc.textFile("B09656_02 Spark Sample.ipynb")lineLengths = lines.map(lambda s: len(s))totalLengths = lineLengths.reduce(lambda a, b: a + b)print(totalLengths)

That results in a screen that looks like this:

Note that we are running a Python 2 Notebook that calls upon the Spark (pyspark) library.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Cookbook, 3rd Edition

Python Cookbook, 3rd Edition

David Beazley, Brian K. Jones
Pandas 1.x Cookbook - Second Edition

Pandas 1.x Cookbook - Second Edition

Matthew Harrison, Theodore Petrou
bash Cookbook, 2nd Edition

bash Cookbook, 2nd Edition

Carl Albing, JP Vossen

Publisher Resources

ISBN: 9781788839440Supplemental Content