52 Cody’s Data Cleaning Techniques Using SAS, Second Edition
you will generate the following output:
7 Highest and Lowest Values for HR
7 Lowest Values
Patno = 020 Value = 10
Patno = 014 Value = 22
Patno = 023 Value = 22
Patno = 022 Value = 48
Patno = 003 Value = 58
Patno = 019 Value = 58
Patno = 012 Value = 60
7 Highest Values
Patno = 001 Value = 88
Patno = 007 Value = 88
Patno = Value = 90
Patno = 004 Value = 101
Patno = 017 Value = 208
Patno = 008 Value = 210
Patno = 321 Value = 900
Using PROC PRINT with a WHERE Statement to List Invalid
Data Values
This section examines ways to detect possible data errors where you can determine reasonable
ranges for each variable. This works quite well for variables such as heart rates and blood
pressures, but may not be feasible for other types of variables, such as financial values that may
take on a very large range of possible values.
One simple way to check each numeric variable for invalid values, where you can determine
reasonable values, is to use PROC PRINT, followed by the appropriate WHERE statement.
Suppose you want to check all the data for any patient having a heart rate outside the range of 40
to 100, a systolic blood pressure outside the range of 80 to 200, and a diastolic blood pressure
outside the range of 60 to 120. For this example, missing values are not treated as invalid. The
PROC PRINT step in Program 2-12 reports all patients with out-of-range values for heart rate,
systolic blood pressure, or diastolic blood pressure.
Chapter 2 Checking Values of Numeric Variables 53
Program 2-12 Using a WHERE Statement with PROC PRINT to List Out-of-Range Data
title "Out-of-range values for numeric variables";
proc print data=clean.patients;
where (HR not between 40 and 100 and HR is not missing) or
(SBP not between 80 and 200 and SBP is not missing) or
(DBP not between 60 and 120 and DBP is not missing);
id Patno;
var HR SBP DBP;
run;
You don't need the parentheses in the WHERE statements because the AND operator is evaluated
before the OR operator. However, because this author can never seem to remember the order of
operation of Boolean operators, the parentheses were included for clarity. Extra parentheses do no
harm.
The resulting output is shown next.
Out-of-range values for numeric variables
Patno HR SBP DBP
004 101 200 120
008 210 . .
009 86 240 180
010 . 40 120
011 68 300 20
014 22 130 90
017 208 . 84
321 900 400 200
020 10 20 8
023 22 34 78

Get Cody's Data Cleaning Techniques Using SAS, Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.