Counting Missing Values
Problem
A set of observations is incomplete. You want to find out how much so.
Solution
Count the number of NULL values in the set.
Discussion
Values can be missing from a set of observations for any number of
reasons: A test may not yet have been administered, something may
have gone wrong during the test that requires invalidating the
observation, and so forth. You can represent such observations in a
dataset as NULL values to signify that
they’re missing or otherwise invalid, then use
summary queries to characterize the completeness of the dataset.
If a table t contains values to be summarized
along a single dimension, a simple summary will do to characterize
the missing values. Suppose t looks like this:
mysql> SELECT subject, score FROM t ORDER BY subject;
+---------+-------+
| subject | score |
+---------+-------+
| 1 | 38 |
| 2 | NULL |
| 3 | 47 |
| 4 | NULL |
| 5 | 37 |
| 6 | 45 |
| 7 | 54 |
| 8 | NULL |
| 9 | 40 |
| 10 | 49 |
+---------+-------+COUNT(*) counts the total number of rows and
COUNT(score) counts only the number of non-missing
scores. The difference between the two is the number of missing
scores, and that difference in relation to the total provides the
percentage of missing scores. These calculations are expressed as
follows:
mysql>SELECT COUNT(*) AS 'n (total)',->COUNT(score) AS 'n (non-missing)',->COUNT(*) - COUNT(score) AS 'n (missing)',->((COUNT(*) - COUNT(score)) * 100) / COUNT(*) AS '% missing'->FROM t;+-----------+-----------------+-------------+-----------+ ...