The process of getting errant data into some kind of standardized format is sometimes called data normalization. This process is sometimes critical for a large array of common data manipulation tasks such as matching/retrieving records and aggregation.
To demonstrate the importance of data normalization, watch what happens when we try to match all titles containing an apostrophe in the following code:
> lib$TITLE %>% str_subset("'")[1] "Are You There, God? It's Me, Margaret"
Fans of modernist Irish literature everywhere yell, What about Finnegans Wake? What indeed:
> lib$TITLE %>% str_subset("’")[1] "Finnegan’s Wake"
If you look closely, you might notice a slight aesthetic difference between the two apostrophes. ...