May 2017
Intermediate to advanced
310 pages
8h 5m
English
Data collection is tedious and, as such, once data is collected, it should not be easily discarded. Just because a dataset has missing fields or attributes does not mean it is not useful. Several methods can be used to fill up the nonexistent parts. One of these methods is by either using a global constant, using the mean value in the dataset, or supplying the data manually. The choice is based on the context and sensitivity of what the data is going to be used for.
Take, for instance, the following data:
import numpy as np data = pandas.DataFrame([ [4., 45., 984.], [np.NAN, np.NAN, 5.], [94., 23., 55.], ])
As we can see, the data elements data[1][0] and data[1][1] have values being np.NAN, representing the fact that they ...