February 2018
Intermediate to advanced
378 pages
10h 14m
English
Most of the machine learning algorithms can't work with the categorical variables, so usually we want to convert them to the one-hot vectors (statisticians prefer to call them dummy variables). Let's convert first, and then I will explain what this is:
In []: features = pd.get_dummies(features, columns = ['color']) features.head() Out[]:
|
length |
fluffy |
color_light black |
color_pink gold |
color_purple polka-dot |
color_space gray |
|
|
0 |
27.545139 |
True |
0 |
1 |
0 |
0 |
|
1 |
12.147357 |
False |
0 |
1 |
0 |
0 |
|
2 |
23.454173 |
True |
1 |
0 |
0 |
0 |
|
3 |
29.956698 |
True |
0 |
1 |
0 |
0 |
|
4 |
34.884065 |
True |
1 |
0 |
0 |
0 |
So now, instead of one column, color, we have four columns: color_light black ...
Read now
Unlock full access