Before anything else, we need to take a look at the data itself, as well as its columns and rows. It's reasonable to start data exploration by understanding the following:
- How do specific values look like, for example, using df.head(N), df.tail(N) , or df.sample(N) to retrieve (and print) the first N, last N, or random N rows from the dataset? As regards heads and tails, by default, N = 5. For our sample, it is 1 (one row). Alternatively, the sample method can take a frac argument, which will return a fraction of records—for example, df.sample(frac=0.25) will return 25% of the initial dataset. Note that printing will omit some columns in the middle if there are too many of them.
- The overall shape of the dataset—the number ...