Visualization

There are multiple visualization packages, but in this section we will be using matplotlib and Bokeh exclusively to give you the best tools for your needs.

Both of the packages come preinstalled with Anaconda. First, let's load the modules and set them up:

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('ggplot')

import bokeh.charts as chrt
from bokeh.io import output_notebook

output_notebook()

The %matplotlib inline and the output_notebook() commands will make every chart generated with matplotlib or Bokeh, respectively, appear within the notebook and not as a separate window.

Histograms

Histograms are by far the easiest way to visually gauge the distribution of your features. There are three ways you can generate histograms ...

Get Learning PySpark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.