Calculating Descriptive Statistics
Problem
You want to characterize a dataset by computing general descriptive or summary statistics.
Solution
Many common descriptive statistics, such as mean and standard deviation, can be obtained by applying aggregate functions to your data. Others, such as median or mode, can be calculated based on counting queries.
Discussion
Suppose that you have a table testscore
containing observations
representing subject ID, age, sex, and test score:
mysql>SELECT subject, age, sex, score FROM testscore ORDER BY subject;
+---------+-----+-----+-------+
| subject | age | sex | score |
+---------+-----+-----+-------+
| 1 | 5 | M | 5 |
| 2 | 5 | M | 4 |
| 3 | 5 | F | 6 |
| 4 | 5 | F | 7 |
| 5 | 6 | M | 8 |
| 6 | 6 | M | 9 |
| 7 | 6 | F | 4 |
| 8 | 6 | F | 6 |
| 9 | 7 | M | 8 |
| 10 | 7 | M | 6 |
| 11 | 7 | F | 9 |
| 12 | 7 | F | 7 |
| 13 | 8 | M | 9 |
| 14 | 8 | M | 6 |
| 15 | 8 | F | 7 |
| 16 | 8 | F | 10 |
| 17 | 9 | M | 9 |
| 18 | 9 | M | 7 |
| 19 | 9 | F | 10 |
| 20 | 9 | F | 9 |
+---------+-----+-----+-------+
A good first step in analyzing a set of observations is to generate some descriptive statistics that summarize their general characteristics as a whole. Common statistical values of this kind include:
The number of observations, their sum, and their range (minimum and maximum)
Measures of central tendency, such as mean, median, and mode
Measures of variation, such as standard deviation and variance
Aside from the median and mode, all of these can be calculated ...
Get MySQL Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.