Chapter 7. Strings and Factors
As well as dealing with numbers and logical values, at some point you will almost certainly need to manipulate text. This is particularly common when you are retrieving or cleaning datasets. Perhaps you are trying to turn the text of a log file into meaningful values, or correct the typos in your data. These data-cleaning activities will be discussed in more depth in Chapter 13, but for now, you will learn how to manipulate character vectors.
Factors are used to store categorical data like gender (“male” or “female”) where there are a limited number of options for a string. They sometimes behave like character vectors and sometimes like integer vectors, depending upon context.
After reading this chapter, you should:
- Be able to construct new strings from existing strings
- Be able to format how numbers are printed
- Understand special characters like tab and newline
- Be able to create and manipulate factors
Text data is stored in character vectors (or, less commonly, character arrays). It’s important to remember that each element of a character vector is a whole string, rather than just an individual character. In R, “string” is an informal term that is used because “element of a character vector” is quite a mouthful.
The fact that the basic unit of text is a character vector means that most string manipulation functions operate on vectors of strings, in the same way that mathematical operations are vectorized.