Figure 19.7 Program Data Vector After Reading from Each Data Set
Abbott, Jennifer .
Abbott, Jennifer Hitchcock-Tyler, Erin .
Abbott, Jennifer Hitchcock-Tyler, Erin 14SEP2000 10:00 103
4. After processing the first observation from the last data set and executing any other
statements in the DATA step, SAS writes the contents of the program data vector to
the new data set. If the DATA step attempts to read past the end of a data set, then the
values of all variables from that data set in the program data vector are set to
missing.
This behavior has two important consequences:
If a variable exists in more than one data set, then the value from the last data set
SAS reads is the value that goes into the new data set, even if that value is
missing. If you want to keep all the values for like-named variables from
different data sets, then you must rename one or more of the variables with the
RENAME= data set option so that each variable has a unique name.
After SAS processes all observations in a data set, the program data vector and
all subsequent observations in the new data set have missing values for the
variables unique to that data set. So, as the next figure shows, the program data
vector for the last observation in the new data set contains missing values for all
variables except Name2.
Figure 19.8 Program Data Vector for the Last Observation
Wittich, Stefan .
5. SAS continues to merge observations until it has copied all observations from all
data sets.
Match-Merging
Merging with a BY Statement
Merging with a BY statement enables you to match observations according to the values
of the BY variables that you specify. Before you can perform a match-merge, all data
sets must be sorted by the variables that you want to use for the merge.
In order to understand match-merging, you must understand three key concepts:
296 Chapter 19 Merging SAS Data Sets
BY variable
specifies a variable that is named in a BY statement.
BY value
specifies the value of a BY variable.
BY group
specifies the set of all observations with the same value for the BY variable (if there
is only one BY variable). If you use more than one variable in a BY statement, then a
BY group is the set of observations with a unique combination of values for those
variables. In discussions of match-merging, BY groups commonly span more than
one data set.
Input SAS Data Set for Examples
The director of a small repertory theater company, the Little Theater, maintains company
records in two SAS data sets, COMPANY and FINANCE.
Table 19.1 Variables in the COMPANY and FINANCE Data Sets
Data Set Variable Description
COMPANY Name player's name
Age player's age
Gender player's gender
FINANCE Name player's name
IdNumber player's employee ID number
Salary player's annual salary
The following program creates, sorts, and displays the COMPANY and FINANCE data
sets:
data company;
input Name $ 1-25 Age 27-28 Gender $ 30;
datalines;
Vincent, Martina 34 F
Phillipon, Marie-Odile 28 F
Gunter, Thomas 27 M
Harbinger, Nicholas 36 M
Benito, Gisela 32 F
Rudelich, Herbert 39 M
Sirignano, Emily 12 F
Morrison, Michael 32 M
;
run;
proc sort data=company;
by Name;
run;
Match-Merging 297
data finance;
input IdNumber $ 1-11 Name $ 13-37 Salary;
datalines;
074-53-9892 Vincent, Martina 35000
776-84-5391 Phillipon, Marie-Odile 29750
929-75-0218 Gunter, Thomas 27500
446-93-2122 Harbinger, Nicholas 33900
228-88-9649 Benito, Gisela 28000
029-46-9261 Rudelich, Herbert 35000
442-21-8075 Sirignano, Emily 5000
;
run;
proc sort data=finance;
by Name;
run;
proc print data=company;
title 'Little Theater Company Roster';
run;
proc print data=finance;
title 'Little Theater Employee Information';
run;
The following output displays the COMPANY and FINANCE data sets. Notice that the
FINANCE data set does not contain an observation for Michael Morrison:
Figure 19.9 The COMPANY Data Set
298 Chapter 19 Merging SAS Data Sets
Figure 19.10 The FINANCE Data Set
The Program
To avoid having to maintain two separate data sets, the director wants to merge the
records for each player from both data sets into a new data set that contains all of the
variables. The variable that is common to both data sets is Name. Therefore, Name is the
appropriate BY variable.
The data sets are already sorted by Name, so no further sorting is required. The
following program merges them by Name:
data employee_info;
merge company finance;
by name;
run;
proc print data=employee_info;
title 'Little Theater Employee Information';
title2 '(including personal and financial information)';
run;
Match-Merging 299

Get Step-by-Step Programming with Base SAS 9.4, Second Edition, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.