Chapter 7 Double Entry and Verification (PROC COMPARE) 159
If you examine the output, you will notice the errors in Gender and SBP that you saw earlier. In
addition, you can see the differences in the two dates (since it was read as character) and the DBP
values. (Note: The differences show up only because the $CHAR informat was used. This
informat maintains leading blanks while the $ informat will left-adjust character fields.)
Using PROC COMPARE with Two Data Sets That Have an
Unequal Number of Observations
You can compare two data sets with unequal numbers of observations, providing you include an
ID statement. To illustrate this, two new files, (FILE_1B.TXT and FILE_2B.TXT) were created.
A new patient number (005) has been added to FILE_1.TXT to make FILE_1B.TXT, and patient
number 004 has been omitted from FILE_2.TXT to make FILE_2B.TXT. Here are the listings of
these two files.
The two SAS data sets (ONE_B and TWO_B) are created by running Program 7-1 again with the
two new data files, and then running PROC COMPARE with the two options LISTBASE and
LISTCOMP (Program 7-5). These two options tell PROC COMPARE to print information on the
ID values that are not in both files, as seen below:
160 Cody's Data Cleaning Techniques Using SAS, Second Edition
Program 7-5 Running PROC COMPARE on Two Data Sets of Different Length
title "Comparing Two Data Sets with Different ID Values";
proc compare base=one_b compare=two_b listbase listcompare;
Here is the output from Program 7-5 (partial listing).
Comparing Two Data Sets with Different ID Values (partial listing)
The COMPARE Procedure
Comparison of WORK.ONE_B with WORK.TWO_B
Number of Variables in Common: 5.
Number of ID Variables: 1.
Comparison Results for Observations
Observation 4 in WORK.ONE_B not found in WORK.TWO_B: Patno=4.
Observation 5 in WORK.ONE_B not found in WORK.TWO_B: Patno=5.
Observation Base Compare ID
First Obs 1 1 Patno=1
First Unequal 3 3 Patno=3
Last Unequal 6 4 Patno=7
Last Obs 6 4 Patno=7