Understanding How Duplicate BY Variables Affect
File Update
How the DATA Step Processes Duplicate BY Variables
When you use a BY statement with MODIFY, both the master and the transaction data
sets can have observations with duplicate values of BY variables. Neither the master nor
the transaction data set needs to be sorted, because BY-group processing uses dynamic
WHERE processing to find an observation in the master data set.
The DATA step processes duplicate observations in the following ways:
If duplicate BY values exist in the master data set, then MODIFY applies the current
transaction to the first occurrence in the master data set.
If duplicate BY values exist in the transaction data set, then the observations are
applied one on top of another so that the values overwrite each other. The value in
the last transaction is the final value in the master data set.
If both the master and the transaction data sets contain duplicate BY values, then
MODIFY applies each transaction to the first occurrence in the group in the master
data set.
The Program
The program in this section updates the INVENTORY_2 master data set with
observations from the ADD_INVENTORY_2 transaction data set. Both data sets contain
consecutive and nonconsecutive duplicate values of the BY variable PartNumber.
The following program creates the master data set INVENTORY_2. Note that the data
set contains three observations for PartNumber M4J7.
data inventory_2;
input PartNumber $ Description $ InStock @17
ReceivedDate date9. @27 Price;
format ReceivedDate date9.;
datalines;
K89R seal 34 27jul1998 245.00
M4J7 sander 98 20jun2012 45.88
M4J7 sander 98 20jun2012 45.88
LK43 filter 121 19may2013 10.99
MN21 brace 43 10aug2013 27.87
M4J7 sander 98 20jun2012 45.88
BC85 clamp 80 16aug2013 9.55
NCF3 valve 198 20mar2013 24.50
KJ66 cutter 6 18jun2013 19.77
;
run;
proc print data=inventory_2;
title 'INVENTORY_2 Data Set';
run;
Understanding How Duplicate BY Variables Affect File Update 343
The following output displays the INVENTORY_2 data set:
Figure 21.4 The INVENTORY_2 Data Set
The following program creates the transaction data set ADD_INVENTORY_2, and then
modifies the master data set INVENTORY_2. Note that the data set
ADD_INVENTORY_2 contains three observations for PartNumber M4J7.
data add_inventory_2;
input PartNumber $ Description $ NewStock;
datalines;
K89R abc 17
M4J7 def 72
M4J7 ghi 66
LK43 jkl 311
M4J7 mno 43
BC85 pqr 75
;
run;
data inventory_2;
modify inventory_2 add_inventory_2;
by PartNumber;
ReceivedDate=today();
InStock=InStock+NewStock;
run;
proc print data=inventory_2;
title 'Tool Warehouse Inventory';
run;
344 Chapter 21 Modifying SAS Data Sets

Get Step-by-Step Programming with Base SAS 9.4, Second Edition, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.