November 2018
Intermediate to advanced
556 pages
14h 42m
English
The dataset is good so we don't need to clean the data. If we analyze the standard deviation of some variables, however, we can see that set_3, sensor_1, sensor_5, sensor_10, sensor_16, sensor_18, and sensor_19 have a standard deviation of zero, so we can remove them.
The following code performs the standard deviation analysis:
# analysis of variancepd.set_option('display.max_columns', None)pd.set_option('display.max_rows', None)df_std = df.groupby('unitid').std()print(df_std==0)We can use the df.describe() function instead of df.std() to see the other statistics.
Then, we can remove the aforementioned variables, as follows:
# removing data that is not usefuldf=df.drop(['set_3', 'sensor_1', 'sensor_5', ...