December 2018
Beginner to intermediate
330 pages
8h 32m
English
By quickly inspecting the first rows of the dataset, we realize that we have only numeric values in our DataFrame. However, from the dataset description, we know that many features are in fact categorical; the numbers there are just encodings or different ways to represent the information.
You should be careful when using numbers to represent categories; the main problem with this approach is that many models (all scikit-learn models, in the case of Python) will consider these numbers as representing values of a numerical variable. For instance, many models will treat the number 2 in the sex column as being actually a number 2, not a female customer. Likewise, the 1s in that column will be considered and treated ...