Skip to Content
Python Data Science Handbook, 2nd Edition
book

Python Data Science Handbook, 2nd Edition

by Jake VanderPlas
December 2022
Beginner to intermediate
588 pages
13h 43m
English
O'Reilly Media, Inc.
Content preview from Python Data Science Handbook, 2nd Edition

Chapter 20. Aggregation and Grouping

A fundamental piece of many data analysis tasks is efficient summarization: computing aggregations like sum, mean, median, min, and max, in which a single number summarizes aspects of a potentially large dataset. In this chapter, we’ll explore aggregations in Pandas, from simple operations akin to what we’ve seen on NumPy arrays to more sophisticated operations based on the concept of a groupby.

For convenience, we’ll use the same display magic function that we used in the previous chapters:

In [1]: import numpy as np
        import pandas as pd

        class display(object):
            """Display HTML representation of multiple objects"""
            template = """<div style="float: left; padding: 10px;">
            <p style='font-family:"Courier New", Courier, monospace'>{0}{1}
            """
            def __init__(self, *args):
                self.args = args

            def _repr_html_(self):
                return '\n'.join(self.template.format(a, eval(a)._repr_html_())
                                 for a in self.args)

            def __repr__(self):
                return '\n\n'.join(a + '\n' + repr(eval(a))
                                   for a in self.args)

Planets Data

Here we will use the Planets dataset, available via the Seaborn package (see Chapter 36). It gives information on planets that astronomers have discovered around other stars (known as extrasolar planets, or exoplanets for short). It can be downloaded with a simple Seaborn command:

In [2]: import seaborn as sns
        planets = sns.load_dataset('planets')
        planets.shape
Out[2]: (1035, 6)
In [3]: planets.head()
Out[3]:             method  number  orbital_period   mass  distance  year
        0  Radial ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Data Science Handbook

Python Data Science Handbook

Jake VanderPlas

Publisher Resources

ISBN: 9781098121211Errata PageSupplemental Content