So far, we’ve been doing machine learning in a very ad hoc manner. We
have some data, we want to fit a model to it, and then we tune the model to
give us the best result based on whatever sampling processes we might have
done and depending on how the data itself is organized. A lot of this
relies on the ability to recognize when to use certain algorithms.
Just by visualizing a set of data, we can usually determine whether we can slap a
linear regression on it, if it makes sense. Likewise, we’ve seen
examples for which data is better suited to be clustered via a
algorithm or something similar.
One issue that we’ve seen is that a lot of these algorithms can be very
different from one another. The options for the
lm() function are quite
different from that of the
nnet() function. Surely there exists something that
provides a common interface for all these different yet commonly used
algorithms. We’re in luck with R in that the
caret package offers a
powerhouse of tools for us to use to help streamline our model building.
The name “caret” is an acronym that stands for “Classification and Regression
Training,” but the package itself is capable of much more. In the R ecosystem, there are
hundreds of machine learning packages. Becoming familiar with the quirks
and special functionality for each one can be a daunting task. Lucky for
caret provides a common interface for all of these packages. Caret also provides great functionality ...