Chapter 7. Aggregations: min, max, and Everything in Between
A first step in exploring any dataset is often to compute various summary statistics. Perhaps the most common summary statistics are the mean and standard deviation, which allow you to summarize the “typical” values in a dataset, but other aggregations are useful as well (the sum, product, median, minimum and maximum, quantiles, etc.).
NumPy has fast built-in aggregation functions for working on arrays; we’ll discuss and try out some of them here.
Summing the Values in an Array
As a quick example, consider computing the sum of all values in an
array. Python itself can do this using the built-in sum function:
In[1]:importnumpyasnprng=np.random.default_rng()
In[2]:L=rng.random(100)sum(L)Out[2]:52.76825337322368
The syntax is quite similar to that of NumPy’s sum
function, and the result is the same in the simplest case:
In[3]:np.sum(L)Out[3]:52.76825337322366
However, because it executes the operation in compiled code, NumPy’s version of the operation is computed much more quickly:
In[4]:big_array=rng.random(1000000)%timeitsum(big_array)%timeitnp.sum(big_array)Out[4]:89.9ms±233µsperloop(mean±std.dev.of7runs,10loopseach)521µs±8.37µsperloop(mean±std.dev.of7runs,1000loopseach)
Be careful, though: the sum function and the np.sum function are not
identical, which can sometimes lead to confusion! In particular, their
optional arguments have different meanings (sum(x, ...