Generating Frequency Distributions
Problem
You want to know the frequency of occurrence for each value in a table.
Solution
Derive a frequency distribution that summarizes the contents of your dataset.
Discussion
A common application for per-group summary techniques is to
generate a breakdown of the number of times each value occurs. This is
called a frequency distribution. For the testscore table, the frequency distribution
looks like this:
mysql>SELECT score, COUNT(score) AS occurrence->FROM testscore GROUP BY score;+-------+------------+ | score | occurrence | +-------+------------+ | 4 | 2 | | 5 | 1 | | 6 | 4 | | 7 | 4 | | 8 | 2 | | 9 | 5 | | 10 | 2 | +-------+------------+
If you express the results in terms of percentages rather than as counts, you produce a relative frequency distribution. To break down a set of observations and show each count as a percentage of the total, use one query to get the total number of observations and another to calculate the percentages for each group:
mysql>SELECT @n := COUNT(score) FROM testscore;mysql>SELECT score, (COUNT(score)*100)/@n AS percent->FROM testscore GROUP BY score;+-------+---------+ | score | percent | +-------+---------+ | 4 | 10 | | 5 | 5 | | 6 | 20 | | 7 | 20 | | 8 | 10 | | 9 | 25 | | 10 | 10 | +-------+---------+
The distributions just shown summarize the number of values for individual scores. However, if the dataset contains a large number of distinct values and you want a distribution that shows only a small number of ...