Interactive, exploratory analysis using Python and Pixiedust

Now since our ETL process is in place, let's use a more lightweight programming environment based on Python for some exploratory data analysis in order to get an idea of what the data looks like. We'll use a visualizations/charting library called Pixiedust here.

The main advantage is that you can directly pass DataFrame objects to it, independent of their size, and Pixiedust will take care of the correct down sampling where necessary. It can create charts with only a single line of code whereas other libraries such as matplotlib need far more complex code to obtain similar charts. And the good news is: It is open source on the Apache V2 license but powered by IBM Watson Data Lab ...

Get Mastering Apache Spark 2.x - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.