Chapter 17. Hierarchical Indexing
Up to this point we’ve been focused primarily on
one-dimensional and two-dimensional data, stored in Pandas Series
and
DataFrame
objects, respectively. Often it is useful to go beyond this
and store higher-dimensional data—that is, data indexed by more than one
or two keys. Early Pandas versions provided Panel
and Panel4D
objects that could be thought of as 3D or 4D analogs to the 2D
DataFrame
, but they were somewhat clunky to use in practice. A far
more common pattern for handling higher-dimensional data is to make use
of hierarchical indexing (also known as multi-indexing) to
incorporate multiple index levels within a single index. In this way,
higher-dimensional data can be compactly represented within the familiar
one-dimensional Series
and two-dimensional DataFrame
objects. (If
you’re interested in true N-dimensional arrays with
Pandas-style flexible indices, you can look into the excellent
Xarray package.)
In this chapter, we’ll explore the direct creation of
MultiIndex
objects; considerations when indexing, slicing, and
computing statistics across multiply indexed data; and useful routines
for converting between simple and hierarchically indexed representations
of data.
We begin with the standard imports:
In
[
1
]:
import
pandas
as
pd
import
numpy
as
np
A Multiply Indexed Series
Let’s start by considering how we might represent
two-dimensional data within a one-dimensional Series
. For concreteness, we will consider a series of data where ...
Get Python Data Science Handbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.