Chapter 8. Machine Learning with the caret Package

So far, we’ve been doing machine learning in a very ad hoc manner. We have some data, we want to fit a model to it, and then we tune the model to give us the best result based on whatever sampling processes we might have done and depending on how the data itself is organized. A lot of this relies on the ability to recognize when to use certain algorithms. Just by visualizing a set of data, we can usually determine whether we can slap a linear regression on it, if it makes sense. Likewise, we’ve seen examples for which data is better suited to be clustered via a kmeans algorithm or something similar.

One issue that we’ve seen is that a lot of these algorithms can be very different from one another. The options for the lm() function are quite different from that of the nnet() function. Surely there exists something that provides a common interface for all these different yet commonly used algorithms. We’re in luck with R in that the caret package offers a powerhouse of tools for us to use to help streamline our model building.

The name “caret” is an acronym that stands for “Classification and Regression Training,” but the package itself is capable of much more. In the R ecosystem, there are hundreds of machine learning packages. Becoming familiar with the quirks and special functionality for each one can be a daunting task. Lucky for us, caret provides a common interface for all of these packages. Caret also provides great functionality ...

Get Introduction to Machine Learning with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.