Chapter 21. Pivot Tables
We have seen how the groupby abstraction lets us explore relationships
within a dataset. A pivot table is a similar operation that is
commonly seen in spreadsheets and other programs that operate on tabular
data. The pivot table takes simple column-wise data as input, and groups
the entries into a two-dimensional table that provides a
multidimensional summarization of the data. The difference between pivot
tables and groupby can sometimes cause confusion; it helps me to think
of pivot tables as essentially a multidimensional version of groupby
aggregation. That is, you split-apply-combine, but both the split and
the combine happen across not a one-dimensional index, but across a
two-dimensional grid.
Motivating Pivot Tables
For the examples in this section, we’ll use the database of passengers on the Titanic, available through the Seaborn library (see Chapter 36):
In[1]:importnumpyasnpimportpandasaspdimportseabornassnstitanic=sns.load_dataset('titanic')
In[2]:titanic.head()Out[2]:survivedpclasssexagesibspparchfareembarkedclass\0 0 3male22.0107.2500SThird111female38.01071.2833CFirst213female26.0007.9250SThird311female35.01053.1000SFirst403male35.0008.0500SThird
whoadult_maledeckembark_townalivealone0manTrueNaNSouthamptonnoFalse1womanFalseCCherbourgyesFalse2womanFalseNaNSouthamptonyesTrue3womanFalseCSouthamptonyesFalse4manTrueNaNSouthampton ...