Factors

If you recall from Chapter 1, Introducing Machine Learning, features that represent a characteristic with categories of values are known as nominal. Although it is possible to use a character vector to store nominal data, R provides a data structure known as a factor specifically for this purpose. A factor is a special case of vector that is solely used for representing nominal variables. In the medical dataset we are building, we might use a factor to represent gender, because it uses two categories: MALE and FEMALE.

Why not use character vectors? An advantage of using factors is that they are generally more efficient than character vectors because the category labels are stored only once. Rather than storing MALE, MALE, FEMALE, the computer ...

Get Machine Learning with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.