4Subsetting with Logical Conditions

In this chapter, we introduce logical conditions and the main logical operators. These represent the key elements for selection operations based on logical criteria, not just a simple list of items or selection helpers. But most of all, we turn our attention from columns to rows of a data frame, and we have to assume that we may need to extract a subset of rows from thousands (still small datasets) or easily even from hundred thousand or millions of rows (already large datasets); therefore, as a general rule of thumb, no manual approach based on scrolling through data and listing rows is suitable. It is through the definition of logical conditions and their combination that we could express elaborated criteria to extract subsets of rows from real datasets.

To be more specific, turning our attention from columns to rows is not meant to say that logical conditions only apply to row selection. Selection based on logical conditions applies equally to rows and columns; however, speaking of open data and real data in general, there is typically a difference of many orders of magnitude in scale between rows and columns, and the scale of dataset sizes is not just a technicality; it is a characteristic deeply ingrained in data science, a pillar of both R and Python environments, which have been developed and present continuous innovations and improvements to deal with content and meaning of data and also with their ever-increasing scale. Therefore, ...

Get Data Science Fundamentals with R, Python, and Open Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.