10Join Data Frames

The join operation between data frames is among the most important operations on data because it is not just technically powerful; it is one of the pillars of the creativity and exploration intrinsic to data science. Looking at data science as just a set of technicalities and logical or statistical skills would largely mislead the whole sense and nature of the discipline, which is to discover knowledge buried deep into data. And the act of discovering knowledge is not just a mechanistic or stochastic process; it is a creative process that requires curiosity and imagination, desire to know more and better about unfamiliar phenomena, and the ability to observe the nuances of reality, which is seldom described with an easy categorization. The join operation is so fundamental because it allows to logically combine different data frames through shared characteristics, permitting to say that an observation in one data frame could be put together with an observation of another data frame because they are both parts of a more complete observation. Like watching a scene from two different perspectives, they are different because they describe what happens from different angles, but they nevertheless describe the same scene, so they could be joined to form a more comprehensive description. This is the invaluable role of join operations.

Several other operations let you combine data frames. Usually, it is said they concatenate or bind data frames, either by columns ...

Get Data Science Fundamentals with R, Python, and Open Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.