Working with large data sources

Most of the data that users feed into matplotlib when generating plots is from NumPy. NumPy is one of the fastest ways of processing numerical and array-based data in Python (if not the fastest), so this makes sense. However by default, NumPy works on in-memory database. If the dataset that you want to plot is larger than the total RAM available on your system, performance is going to plummet.

In the following section, we're going to take a look at an example that illustrates this limitation. But first, let's get our notebook set up, as follows:

In [1]: import matplotlib
        matplotlib.use('nbagg')
        %matplotlib inline

Here are the modules that we are going to use:

In [2]: import glob, io, math, os
        import psutil
 import ...

Get Mastering matplotlib now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.