Chapter 7 Double Entry and Verification (PROC COMPARE) 159
If you examine the output, you will notice the errors in Gender and SBP that you saw earlier. In
addition, you can see the differences in the two dates (since it was read as character) and the DBP
values. (Note: The differences show up only because the $CHAR informat was used. This
informat maintains leading blanks while the $ informat will left-adjust character fields.)
Using PROC COMPARE with Two Data Sets That Have an
Unequal Number of Observations
You can compare two data sets with unequal numbers of observations, providing you include an
ID statement. To illustrate this, two new files, (FILE_1B.TXT and FILE_2B.TXT) were created.
A new patient number (005) has been added to FILE_1.TXT to make FILE_1B.TXT, and patient
number 004 has been omitted from FILE_2.TXT to make FILE_2B.TXT. Here are the listings of
these two files.
FILE_1B.TXT
001M10211946130 80
002F12201950110 70
003M09141956140 90
004F10101960180100
005M01041930166 88
007m10321940184110
FILE_2B.TXT
001M1021194613080
002F12201950110 70
003M09141956144 90
007M10231940184110
The two SAS data sets (ONE_B and TWO_B) are created by running Program 7-1 again with the
two new data files, and then running PROC COMPARE with the two options LISTBASE and
LISTCOMP (Program 7-5). These two options tell PROC COMPARE to print information on the
ID values that are not in both files, as seen below:
160 Cody's Data Cleaning Techniques Using SAS, Second Edition
Program 7-5 Running PROC COMPARE on Two Data Sets of Different Length
title "Comparing Two Data Sets with Different ID Values";
proc compare base=one_b compare=two_b listbase listcompare;
id Patno;
run;
Here is the output from Program 7-5 (partial listing).
Comparing Two Data Sets with Different ID Values (partial listing)
The COMPARE Procedure
Comparison of WORK.ONE_B with WORK.TWO_B
Variables Summary
Number of Variables in Common: 5.
Number of ID Variables: 1.
Comparison Results for Observations
Observation 4 in WORK.ONE_B not found in WORK.TWO_B: Patno=4.
Observation 5 in WORK.ONE_B not found in WORK.TWO_B: Patno=5.
Observation Summary
Observation Base Compare ID
First Obs 1 1 Patno=1
First Unequal 3 3 Patno=3
Last Unequal 6 4 Patno=7
Last Obs 6 4 Patno=7

Get Cody's Data Cleaning Techniques Using SAS, Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.