March 2018
Beginner to intermediate
570 pages
13h 42m
English
The process of getting errant data into some kind of standardized format is sometimes called data normalization. This process is sometimes critical for a large array of common data manipulation tasks such as matching/retrieving records and aggregation.
To demonstrate the importance of data normalization, watch what happens when we try to match all titles containing an apostrophe in the following code:
> lib$TITLE %>% str_subset("'")[1] "Are You There, God? It's Me, Margaret"
Fans of modernist Irish literature everywhere yell, What about Finnegans Wake? What indeed:
> lib$TITLE %>% str_subset("’")[1] "Finnegan’s Wake"
If you look closely, you might notice a slight aesthetic difference between the two apostrophes. ...
Read now
Unlock full access