Geographic Differences

In this section we explore the changes in home prices in different cities in the Bay Area. Because we are looking at average prices, we must take care not to include cities with only a few sales. We decided to focus on all cities with an average of at least 10 sales per week. This gave us 58 cities (24% of the 245 cities in the data) with 428,415 sales (82% of the sales).

We then calculated the average weekly house price. Figure 18-6 shows these prices, with each city drawn with a different line. Statisticians have an evocative name for this type of display: the spaghetti plot. It's very hard to see anything in the big jumble of lines. One method of improvement is to smooth each line, removing short-term variation and allowing us to focus on the long-term trends we are looking for.

Average sale price for each week for each city. This type of plot is often called a spaghetti plot. It suggests the need for smoothing, because the week-to-week variation in the curves makes it impossible to detect trends.

Figure 18-6. Average sale price for each week for each city. This type of plot is often called a spaghetti plot. It suggests the need for smoothing, because the week-to-week variation in the curves makes it impossible to detect trends.

To create smooth curves, we used generalized additive models (GAM), a generalization of linear models (Wood 2006). This method fits smooth curves by optimizing the trade-off between being close to the data and being very smooth, in effect removing noisy short-term effects and emphasizing the long-term trend. This is exactly what we need: we are not interested ...

Get Beautiful Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.