June 2020
Intermediate to advanced
382 pages
11h 39m
English
Many machine learning algorithms require all the features to be continuous variables. It means that if some of the features are category variables, we need to find a strategy to convert them into continuous variables. One-hot encoding is one of the most effective ways of performing this transformation. For this particular problem, the only category variable we have is Gender. Let's convert that into a continuous variable using one-hot encoding:
enc = sklearn.preprocessing.OneHotEncoder()enc.fit(dataset.iloc[:,[0]])onehotlabels = enc.transform(dataset.iloc[:,[0]]).toarray()genders = pd.DataFrame({'Female': onehotlabels[:, 0], 'Male': onehotlabels[:, 1]})result = pd.concat([genders,dataset.iloc[:,1:]], axis=1, sort=False)