Chapter 17. Hierarchical Indexing

Up to this point we’ve been focused primarily on one-dimensional and two-dimensional data, stored in Pandas Series and DataFrame objects, respectively. Often it is useful to go beyond this and store higher-dimensional data—that is, data indexed by more than one or two keys. Early Pandas versions provided Panel and Panel4D objects that could be thought of as 3D or 4D analogs to the 2D DataFrame, but they were somewhat clunky to use in practice. A far more common pattern for handling higher-dimensional data is to make use of hierarchical indexing (also known as multi-indexing) to incorporate multiple index levels within a single index. In this way, higher-dimensional data can be compactly represented within the familiar one-dimensional Series and two-dimensional DataFrame objects. (If you’re interested in true N-dimensional arrays with Pandas-style flexible indices, you can look into the excellent Xarray package.)

In this chapter, we’ll explore the direct creation of MultiIndex objects; considerations when indexing, slicing, and computing statistics across multiply indexed data; and useful routines for converting between simple and hierarchically indexed representations of data.

We begin with the standard imports:

In [1]: import pandas as pd
        import numpy as np

A Multiply Indexed Series

Let’s start by considering how we might represent two-dimensional data within a one-dimensional Series. For concreteness, we will consider a series of data where ...

Get Python Data Science Handbook, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.