October 2017
Intermediate to advanced
532 pages
16h 10m
English
One of the many nice features of hdf5 files is their ability to preserve the data types of each column, which substantially reduces the memory needed. In this case, three of these columns are stored as a pandas category instead of as an object. Storing them as object will lead to a four times increase in memory usage:
>>> mem_cat = crime.memory_usage().sum()>>> mem_obj = crime.astype({'OFFENSE_TYPE_ID':'object', 'OFFENSE_CATEGORY_ID':'object', 'NEIGHBORHOOD_ID':'object'}) \ .memory_usage(deep=True).sum()>>> mb = 2 ** 20>>> round(mem_cat / mb, 1), round(mem_obj / mb, 1)(29.4, 122.7)
In order to intelligently select and slice rows by date using the indexing operator, the index must contain date values. In step 2, we move the ...
Read now
Unlock full access