Just as in the first chapter, we will have to scale the data, since the income axis is significantly greater and thus would diminish the impact of the age axis, which actually has a good predictive power in this kind of problem. This is because it is expected that older people have had more time to settle down, save money, and buy a house, as compared to younger people.

We apply the same rescaling from Chapter 1Classification Using K Nearest Neighbors, and obtain the following table:

Age Scaled age Annual income in USD Scaled annual income House ownership status
23 0.09375 50000 0.2 non-owner
37 0.53125 34000 0.04 non-owner
48 0.875 40000 0.1 owner
52 1 30000 0 non-owner
28 0.25 95000 0.65 owner
25 0.15625 78000 ...

Get Data Science Algorithms in a Week - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.