Visualization and variable reduction

In the previous section, the housing data underwent a lot of analytical pre-processing, and we are now ready to further analyze this. First, we begin with visualization. Since we have a lot of variables, the visualization on the R visual device is slightly difficult. As seen in earlier chapters, to visualize the random forests and other large, complex structures, we will initiate a PDF device and store the graphs in it. In the housing dataset, the main variable is the housing price and so we will first name the output variable SalePrice. We need to visualize the data in a way that facilitates the relationship between the numerous variables and the SalePrice. The independent variables can be either numeric or ...

Get Hands-On Ensemble Learning with R now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.