If you recall from Chapter 1, Introducing Machine Learning, features that represent a characteristic with categories of values are known as
nominal. Although it is possible to use a character vector to store nominal data, R provides a data structure known as a factor specifically for this purpose. A factor is a special case of vector that is solely used for representing nominal variables. In the medical dataset we are building, we might use a factor to represent
gender, because it uses two categories:
Why not use
character vectors? An advantage of using factors is that they are generally more efficient than
character vectors because the category labels are stored only once. Rather than storing
FEMALE, the computer ...