Calculating Descriptive Statistics
Problem
You want to characterize a dataset by computing general descriptive or summary statistics.
Solution
Many common descriptive statistics, such as mean and standard deviation, can be obtained by applying aggregate functions to your data. Others, such as median or mode, can be calculated based on counting queries.
Discussion
Suppose you have a table testscore containing
observations representing subject ID, age, sex, and test score:
mysql> SELECT subject, age, sex, score FROM testscore ORDER BY subject;
+---------+-----+-----+-------+
| subject | age | sex | score |
+---------+-----+-----+-------+
| 1 | 5 | M | 5 |
| 2 | 5 | M | 4 |
| 3 | 5 | F | 6 |
| 4 | 5 | F | 7 |
| 5 | 6 | M | 8 |
| 6 | 6 | M | 9 |
| 7 | 6 | F | 4 |
| 8 | 6 | F | 6 |
| 9 | 7 | M | 8 |
| 10 | 7 | M | 6 |
| 11 | 7 | F | 9 |
| 12 | 7 | F | 7 |
| 13 | 8 | M | 9 |
| 14 | 8 | M | 6 |
| 15 | 8 | F | 7 |
| 16 | 8 | F | 10 |
| 17 | 9 | M | 9 |
| 18 | 9 | M | 7 |
| 19 | 9 | F | 10 |
| 20 | 9 | F | 9 |
+---------+-----+-----+-------+A good first step in analyzing a set of observations is to generate some descriptive statistics that summarize their general characteristics as a whole. Common statistical values of this kind include:
The number of observations, their sum, and their range (minimum and maximum)
Measures of central tendency, such as mean, median, and mode
Measures of variation, such as standard deviation or variance
Aside from the median and mode, all of these can be calculated ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access