July 2017
Intermediate to advanced
796 pages
18h 55m
English
To further demonstrate the clustering example, we will use the Saratoga NY Homes dataset downloaded from http://course1.winona.edu/bdeppa/Stat%20425/Datasets.html as an unsupervised learning technique using Spark MLlib. The dataset contains several features of houses located in the suburb of the New York City. For example, price, lot size, waterfront, age, land value, new construct, central air, fuel type, heat type, sewer type, living area, pct.college, bedrooms, fireplaces, bathrooms, and the number of rooms. However, only a few features have been shown in the following table:
| Price | Lot Size | Water Front | Age | Land Value | Rooms |
| 132,500 | 0.09 | 0 | 42 | 5,000 | 5 |
| 181,115 | 0.92 | 0 | 0 | 22,300 | 6 |
Read now
Unlock full access