Visualization and variable reduction
In the previous section, the housing data underwent a lot of analytical pre-processing, and we are now ready to further analyze this. First, we begin with visualization. Since we have a lot of variables, the visualization on the R visual device is slightly difficult. As seen in earlier chapters, to visualize the random forests and other large, complex structures, we will initiate a PDF device and store the graphs in it. In the housing dataset, the main variable is the housing price and so we will first name the output variable
SalePrice. We need to visualize the data in a way that facilitates the relationship between the numerous variables and the
SalePrice. The independent variables can be either numeric or ...