Chapter 15. Regular Expressions

Introduction

In Chapter 14, you learned a whole bunch of useful functions for working with strings. This chapter will focus on functions that use regular expressions, a concise and powerful language for describing patterns within strings. The term regular expression is a bit of a mouthful, so most people abbreviate it to regex1 or regexp.

The chapter starts with the basics of regular expressions and the most useful stringr functions for data analysis. We’ll then expand your knowledge of patterns and cover seven important new topics (escaping, anchoring, character classes, shorthand classes, quantifiers, precedence, and grouping). Next, we’ll talk about some of the other types of patterns that stringr functions can work with and the various “flags” that allow you to tweak the operation of regular expressions. We’ll finish with a survey of other places in the tidyverse and base R where you might use regexes.

Prerequisites

In this chapter, we’ll use regular expression functions from stringr and tidyr, both core members of the tidyverse, as well as data from the babynames package:

library(tidyverse)
library(babynames)

Through this chapter, we’ll use a mix of simple inline examples so you can get the basic idea, the baby names data, and three character vectors from stringr:

  • fruit contains the names of 80 fruits.
  • words contains 980 common English words.
  • sentences contains 720 short sentences.

Pattern Basics

We’ll use str_view() to learn how ...

Get R for Data Science, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.