O'Reilly logo

Learning Jupyter by Dan Toomey

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Spark - evaluating history data

In this example, we combine the previous sections to look at some historical data and determine some useful attributes.

The historical data we are using is the guest list for The Jon Stewart Show. A typical record from the data looks like this:

1999,actor,1/11/99,Acting,Michael J. Fox

It contains the year, occupation of the guest, date of appearance, logical grouping of the occupation, and the name of the guest.

For our analysis, we will be looking at number of appearances per year, the most appearing occupation, and the most appearing personality.

We will be using this script:

import pyspark
import csv
import operator
import itertools
import collections
if not 'sc' in globals():
    sc = pyspark.SparkContext()
years ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required