Examining the popular-movies-nicer.py script

So let's see how broadcast variables let us transmit the table of movie IDs to movie names to whatever nodes our job might be running on. In the download package for this book, look for the popular-movies-nicer Python script, save that to your SparkCourse folder, and open it up. You can see I've added a few things to our previous script here:

The first new thing I've added is this loadMovieNames function:

def loadMovieNames(): 

This loadMovieNames function creates a dictionary in Python that maps movie IDs to movie names. If you're not familiar with dictionaries in Python, they're like hash tables, ...

Get Frank Kane's Taming Big Data with Apache Spark and Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.