Multicollinearity

As mentioned in the opening paragraphs of this section, the number of variables is a little high for comfort. When there is a high number of variables the chances of multicollinearity increases. Multicollinearity is when two or more variables are correlated with each other somehow.

From a cursory glance at the data, we can tell that is in fact true. A simple thing to note is GarageArea is correlated with GarageCars. In real life, this makes sense—a garage that can take two cars would be logically larger in area compared to a garage that can only store one car. Likewise, zoning is highly correlated with the neighborhood.

A good way to think about the variables is in terms of information included in the variables. Sometimes, ...

Get Go Machine Learning Projects now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.