Chapter 4. Regression in a Nutshell
In Chapter 1, in which we briefly explored the realms of machine learning, we began with linear regression because it is probably something that you have come across at some point in your mathematical training. The process is fairly intuitive and easier to explain as a first concept than some other machine learning models. Additionally, many realms of data analysis rely on regression modeling ranging from a business trying to forecast its profits, to the frontiers of science trying to figure out new discoveries governing the laws of the universe. We can find regression in any scenario in which a prediction against time is needed. In this chapter, we examine how to use regression modeling in R to a deep extent, but we also explore some caveats and pitfalls to be aware of in the process.
The main motivation behind regression is to build an equation by which we can learn more about our data. There is no hard-and-fast rule about which type of regression model to fit to your data, however. Choosing between a logistic regression, linear regression, or multivariate regression model depends on the problem and the data that you have. You could fit a straight line to a given series of data points, but is that always the best case? Ideally, we are after a balance of simplicity and explanatory power. A straight line fit to a complex series of data might be simple, but might not describe the whole picture. On the other hand, having a very simple set of data ...