December 2018
Beginner to intermediate
684 pages
21h 9m
English
In a next step, we remove rows and columns that lack more than 20% of the observations, resulting in a loss of 6% of the observations and three columns:
rows_before, cols_before = data.shapedata = (data .dropna(axis=1, thresh=int(len(data) * .8)) .dropna(thresh=int(len(data.columns) * .8)))data = data.fillna(data.median())rows_after, cols_after = data.shapeprint('{:,d} rows and {:,d} columns dropped'.format(rows_before - rows_after, cols_before - cols_after))2,985 rows and 3 columns dropped
At this point, we have 51 features and the categorical identifier of the stock:
data.sort_index(1).info()MultiIndex: 47377 entries, (2014-01-02, Equity(24 [AAPL])) to (2015-12- 31, Equity(47208 [GPRO]))Data columns (total ...