January 2015
Intermediate to advanced
360 pages
8h 50m
English
In the previous chapter, we defined some useful statistical functions to compute mean and standard deviation and normalize a value. We can use these functions to locate outliers in our trip data. What we can do is apply the mean() and
stdev() functions to the distance value in each leg of a trip to get the population mean and standard deviation.
We can then use the z() function to compute a normalized value for each leg. If the normalized value is more than 3, the data is extremely far from the mean. If we reject this outliers, we have a more uniform set of data that's less likely to harbor reporting or measurement errors.
The following is how we can tackle this:
from stats import mean, stdev, z dist_data = list(map(dist, ...
Read now
Unlock full access