Skip to Content
Python Data Science Handbook, 2nd Edition
book

Python Data Science Handbook, 2nd Edition

by Jake VanderPlas
December 2022
Beginner to intermediate
588 pages
13h 43m
English
O'Reilly Media, Inc.
Content preview from Python Data Science Handbook, 2nd Edition

Chapter 19. Combining Datasets: merge and join

One important feature offered by Pandas is its high-performance, in-memory join and merge operations, which you may be familiar with if you have ever worked with databases. The main interface for this is the pd.merge function, and we’ll see a few examples of how this can work in practice.

For convenience, we will again define the display function from the previous chapter after the usual imports:

In [1]: import pandas as pd
        import numpy as np

        class display(object):
            """Display HTML representation of multiple objects"""
            template = """<div style="float: left; padding: 10px;">
            <p style='font-family:"Courier New", Courier, monospace'>{0}{1}
            """
            def __init__(self, *args):
                self.args = args

            def _repr_html_(self):
                return '\n'.join(self.template.format(a, eval(a)._repr_html_())
                                 for a in self.args)

            def __repr__(self):
                return '\n\n'.join(a + '\n' + repr(eval(a))
                                   for a in self.args)

Relational Algebra

The behavior implemented in pd.merge is a subset of what is known as relational algebra, which is a formal set of rules for manipulating relational data that forms the conceptual foundation of operations available in most databases. The strength of the relational algebra approach is that it proposes several fundamental operations, which become the building blocks of more complicated operations on any dataset. With this lexicon of fundamental operations implemented efficiently in a database or other program, a wide range of fairly complicated ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Data Science Handbook

Python Data Science Handbook

Jake VanderPlas

Publisher Resources

ISBN: 9781098121211Errata PageSupplemental Content