Categorizing Noncategorical Data
Problem
You need to summarize a set of values that are not naturally categorical.
Solution
Use an expression to group the values into categories.
Discussion
Grouping by Expression Results showed how to group rows by
expression results. One important application for doing so is to
provide categories for values that are not particularly categorical.
This is useful because GROUP
BY
works best for columns with
repetitive values. For example, you might attempt to perform a
population analysis by grouping rows in the states
table using values in the pop
column. As it happens, that does not
work very well due to the high number of distinct values in the
column. In fact, they’re all distinct, as the
following query shows:
mysql>SELECT COUNT(pop), COUNT(DISTINCT pop) FROM states;
+------------+---------------------+
| COUNT(pop) | COUNT(DISTINCT pop) |
+------------+---------------------+
| 50 | 50 |
+------------+---------------------+
In situations like this, where values do not group nicely into a small number of sets, you can use a transformation that forces them into categories. Begin by determining the range of population values:
mysql>SELECT MIN(pop), MAX(pop) FROM states;
+----------+----------+
| MIN(pop) | MAX(pop) |
+----------+----------+
| 506529 | 35893799 |
+----------+----------+
You can see from that result that if you divide the pop
values by five million, they’ll group into six categories—a reasonable number. (The category ranges will be 1 to 5,000,000, ...
Get MySQL Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.