Factors
If you recall from Chapter 1, Introducing Machine Learning, features that represent a characteristic with categories of values are known as
nominal. Although it is possible to use a character vector to store nominal data, R provides a data structure known as a factor specifically for this purpose. A factor is a special case of vector that is solely used for representing nominal variables. In the medical dataset we are building, we might use a factor to represent gender
, because it uses two categories: MALE
and FEMALE
.
Why not use character
vectors? An advantage of using factors is that they are generally more efficient than character
vectors because the category labels are stored only once. Rather than storing MALE
, MALE
, FEMALE
, the computer ...
Get Machine Learning with R now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.