January 2020
Beginner to intermediate
372 pages
10h
English
In this recipe, we performed a one-hot encoding of categorical variables using pandas and scikit-learn.
We loaded the dataset and separated it into train and test sets using scikit-learn's train_test_split() function. Next, we used pandas' get_dummies() function on the A4 variable, setting drop_first=True to drop the first binary variable and hence obtain k-1 binary variables. Next, we used get_dummies() on all of the categorical variables of the dataset, which returned a dataframe with binary variables representing the categories of the different features.
Finally, we performed one-hot encoding using OneHotEncoder() ...
Read now
Unlock full access