Chapter 20. Aggregation and Grouping
A fundamental piece of many data analysis tasks is efficient
summarization: computing aggregations like sum, mean, median,
min, and max, in which a single number summarizes aspects of a
potentially large dataset. In this chapter, we’ll explore
aggregations in Pandas, from simple operations akin to what
we’ve seen on NumPy arrays to more sophisticated operations
based on the concept of a groupby.
For convenience, we’ll use the same display magic function
that we used in the previous chapters:
In[1]:importnumpyasnpimportpandasaspdclassdisplay(object):"""Display HTML representation of multiple objects"""template="""<div style="float: left; padding: 10px;"><p style='font-family:"Courier New", Courier, monospace'>{0}{1}"""def__init__(self,*args):self.args=argsdef_repr_html_(self):return'\n'.join(self.template.format(a,eval(a)._repr_html_())forainself.args)def__repr__(self):return'\n\n'.join(a+'\n'+repr(eval(a))forainself.args)
Planets Data
Here we will use the Planets dataset, available via the Seaborn package (see Chapter 36). It gives information on planets that astronomers have discovered around other stars (known as extrasolar planets, or exoplanets for short). It can be downloaded with a simple Seaborn command:
In[2]:importseabornassnsplanets=sns.load_dataset('planets')planets.shapeOut[2]:(1035,6)
In[3]:planets.head()Out[3]:methodnumberorbital_periodmassdistanceyear0Radial ...