In this chapter you will go through an example project end to end, pretending to be a recently hired data scientist in a real estate company.1 Here are the main steps you will go through:
Look at the big picture.
Get the data.
Discover and visualize the data to gain insights.
Prepare the data for Machine Learning algorithms.
Select a model and train it.
Fine-tune your model.
Present your solution.
Launch, monitor, and maintain your system.
When you are learning about Machine Learning, it is best to experiment with real-world data, not artificial datasets. Fortunately, there are thousands of open datasets to choose from, ranging across all sorts of domains. Here are a few places you can look to get data:
Popular open data repositories:
Meta portals (they list open data repositories):
Other pages listing many popular open data repositories:
In this chapter we chose the California Housing Prices dataset from the StatLib repository2 (see Figure 2-1). This dataset was based on data from the 1990 California census. It is not exactly recent (you could still afford a nice house in the Bay Area at the time), but it has many qualities for learning, so we will pretend it ...