Skip to Content
Python Data Science Handbook, 2nd Edition
book

Python Data Science Handbook, 2nd Edition

by Jake VanderPlas
December 2022
Beginner to intermediate
588 pages
13h 43m
English
O'Reilly Media, Inc.
Content preview from Python Data Science Handbook, 2nd Edition

Chapter 18. Combining Datasets: concat and append

Some of the most interesting studies of data come from combining different data sources. These operations can involve anything from very straightforward concatenation of two different datasets to more complicated database-style joins and merges that correctly handle any overlaps between the datasets. Series and DataFrames are built with this type of operation in mind, and Pandas includes functions and methods that make this sort of data wrangling fast and straightforward.

Here we’ll take a look at simple concatenation of Series and DataFrames with the pd.concat function; later we’ll dive into more sophisticated in-memory merges and joins implemented in Pandas.

We begin with the standard imports:

In [1]: import pandas as pd
        import numpy as np

For convenience, we’ll define this function, which creates a DataFrame of a particular form that will be useful in the following examples:

In [2]: def make_df(cols, ind):
            """Quickly make a DataFrame"""
            data = {c: [str(c) + str(i) for i in ind]
                    for c in cols}
            return pd.DataFrame(data, ind)

        # example DataFrame
        make_df('ABC', range(3))
Out[2]:     A   B   C
        0  A0  B0  C0
        1  A1  B1  C1
        2  A2  B2  C2

In addition, we’ll create a quick class that allows us to display multiple DataFrames side by side. The code makes use of the special _repr_html_ method, which IPython/Jupyter uses to implement its rich object display:

In [3]: class display(object):
            """Display HTML representation of multiple objects"""
            template 
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Data Science Handbook

Python Data Science Handbook

Jake VanderPlas

Publisher Resources

ISBN: 9781098121211Errata PageSupplemental Content