In this chapter you will work through an example project end to end, pretending to be a recently hired data scientist at a real estate company.1 Here are the main steps you will go through:
Look at the big picture.
Get the data.
Discover and visualize the data to gain insights.
Prepare the data for Machine Learning algorithms.
Select a model and train it.
Fine-tune your model.
Present your solution.
Launch, monitor, and maintain your system.
When you are learning about Machine Learning, it is best to experiment with real-world data, not artificial datasets. Fortunately, there are thousands of open datasets to choose from, ranging across all sorts of domains. Here are a few places you can look to get data:
Popular open data repositories
Meta portals (they list open data repositories)
Other pages listing many popular open data repositories
In this chapter we’ll use the California Housing Prices dataset from the StatLib repository2 (see Figure 2-1). This dataset is based on data from the 1990 California census. It is not exactly recent (a nice house in the Bay Area was still affordable at the time), but it has many qualities for learning, so we will pretend it is recent data. ...