Chapter 14. Joining and Concatenating
Data often comes from multiple sources that you will have to connect and combine in a meaningful way. There are multiple ways to combine DataFrames, which we’ll go over in this chapter.
Funnily enough, this is where Polars once started. Faced with combining two CSV files in Rust, Ritchie Vink started his journey which ultimately led to where we are now. This gives a special sentiment to the operations in this chapter.
In this chapter, you’ll learn:
-
That you can use
df.join()to combine DataFrames based on the values in the DataFrames and the strategies outlined here -
That
df.join_asof()is a special join that joins DataFrames based on the nearest value in the other DataFrame -
How to combine DataFrames using
pl.concat(),df.vstack(),df.hstack(), anddf.extend() -
How to combine Series with
series.append() -
The differences between all these methods and when to use them
The instructions to get any files you might need are in Chapter 2. We assume that you have the files in the data subdirectory.
Joining
To combine different DataFrames, Polars offers the df.join() method.
It takes the arguments listed in Table 14-1.
| Argument | Description |
|---|---|
|
The DataFrame to join with. |
|
The column to join on when the name is the same in the left and right DataFrames. |
|
The columns to join if they have different names in the left and right DataFrames. |
|
The join strategy ... |
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access