Testing Your Program
As a final step in preparing your data sets, you should test your program. Create small
temporary SAS data sets that contain a sample of observations that test all of your
program's logic. If your logic is faulty and you get unexpected output, you can use the
DATA step debugger to debug your program. For complete information about the
DATA Step Debugger, see SAS Data Set Options: Reference.
Combining SAS Data Sets: Methods
Concatenating
Definition
Concatenating data sets is the combining of two or more data sets, one after the other,
into a single data set. The number of observations in the new data set is the sum of the
number of observations in the original data sets. The order of observations is sequential.
All observations from the first data set are followed by all observations from the second
data set, and so on.
In the simplest case, all input data sets contain the same variables. If the input data sets
contain different variables, observations from one data set have missing values for
variables defined only in other data sets. In either case, the variables in the new data set
are the same as the variables in the old data sets.
Syntax
Use this form of the SET statement to concatenate data sets:
SET data-set(s);
where
data-set
specifies any valid SAS data set name.
For a complete description of valid SAS data set names, see the SET statement in SAS
Statements: Reference.
DATA Step Processing during Concatenation
Compilation phase
SAS reads the descriptor information of each data set that is named in the SET
statement and then creates a program data vector that contains all the variables from
all data sets as well as variables created by the DATA step.
Execution — Step 1
SAS reads the first observation from the first data set into the program data vector. It
processes the first observation and executes other statements in the DATA step. It
then writes the contents of the program data vector to the new data set.
The SET statement does not reset the values in the program data vector to missing,
except for variables whose value is calculated or assigned during the DATA step.
Variables that are created by the DATA step are set to missing at the beginning of
each iteration of the DATA step. Variables that are read from a data set are not.
Execution — Step 2
SAS continues to read one observation at a time from the first data set until it finds
an end-of-file indicator. The values of the variables in the program data vector are
456 Chapter 21 Reading, Combining, and Modifying SAS Data Sets
then set to missing, and SAS begins reading observations from the second data set,
and so on, until it reads all observations from all data sets.
Example 1: Concatenation Using the DATA Step
In this example, each data set contains the variables Common and Number, and the
observations are arranged in the order of the values of Common. Generally, you
concatenate SAS data sets that have the same variables. In this case, each data set also
contains a unique variable to show the effects of combining data sets more clearly. The
following shows the Animal and the Plant input data sets in the library that is referenced
by the libref Example:
Animal Plant
OBS Common Animal Number OBS Common Plant Number
1 a Ant 5 1 g Grape 69
2 b Bird 2 h Hazelnut 55
3 c Cat 17 3 i Indigo .
4 d Dog 9 4 j Jicama 14
5 e Eagle 5 k Kale 5
6 f Frog 76 6 l Lentil 77
The following program uses a SET statement to concatenate the data sets and then prints
the results:
data concatenation;
set animal plant;
run;
proc print data=concatenation;
var Common Animal Plant Number;
title 'Data Set CONCATENATION';
run;
Combining SAS Data Sets: Methods 457

Get SAS 9.4 Language Reference, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.