In this chapter, we show off a miscellany of machine learning models available in R. Even though the main algorithms that we’ve covered thus far really make up the majority of models, I wanted to include this chapter to provide a comprehensive view of the machine learning ecosystem in R.
We cover classification again, but through the lens of Bayesian statistics. This is a popular field of statistics and helps to transition to some other algorithms that depend on similar logic. We also cover principal component analysis, support vector machines, and k-nearest neighbor algorithms.
One way to do classification with probabilities is through the use of Bayesian statistics. Although this field can have a rather steep learning curve, essentially we are trying to answer the question, “Based on the features we have, what is the probability that the outcome is class X?” A naive Bayes classifier answers this question with a rather bold assumption: all of the predictors we have are independent of one another. The advantage to doing this is that we drastically reduce the complexity of the calculations we’re doing.
Bayesian statistics relies a lot on multiplication of probabilities. Let’s do a quick primer on this so you’re up to speed. Suppose that I ride my bike in 100 races and I win 54 of them (if only!). The probability of me winning a race, therefore, is just the number of times I’ve won divided ...