Calculating Descriptive Statistics
Problem
You want to characterize a dataset by computing general descriptive or summary statistics.
Solution
Many common descriptive statistics, such as mean and standard deviation, can be obtained by applying aggregate functions to your data. Others, such as median or mode, can be calculated based on counting queries.
Discussion
Suppose you have a table testscore
containing
observations representing subject ID, age, sex, and test score:
mysql> SELECT subject, age, sex, score FROM testscore ORDER BY subject;
+---------+-----+-----+-------+
| subject | age | sex | score |
+---------+-----+-----+-------+
| 1 | 5 | M | 5 |
| 2 | 5 | M | 4 |
| 3 | 5 | F | 6 |
| 4 | 5 | F | 7 |
| 5 | 6 | M | 8 |
| 6 | 6 | M | 9 |
| 7 | 6 | F | 4 |
| 8 | 6 | F | 6 |
| 9 | 7 | M | 8 |
| 10 | 7 | M | 6 |
| 11 | 7 | F | 9 |
| 12 | 7 | F | 7 |
| 13 | 8 | M | 9 |
| 14 | 8 | M | 6 |
| 15 | 8 | F | 7 |
| 16 | 8 | F | 10 |
| 17 | 9 | M | 9 |
| 18 | 9 | M | 7 |
| 19 | 9 | F | 10 |
| 20 | 9 | F | 9 |
+---------+-----+-----+-------+
A good first step in analyzing a set of observations is to generate some descriptive statistics that summarize their general characteristics as a whole. Common statistical values of this kind include:
The number of observations, their sum, and their range (minimum and maximum)
Measures of central tendency, such as mean, median, and mode
Measures of variation, such as standard deviation or variance
Aside from the median and mode, all of these can be calculated ...
Get MySQL Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.