Combining SAS Data Sets: Basic Concepts
What You Need to Know Before Combining Information Stored in
Multiple SAS Data Sets
Many applications require input data to be in a specific format before the data can be
processed to produce meaningful results. The data typically comes from multiple sources
and might be in different formats. Therefore, you often, if not always, have to take
intermediate steps to logically relate and process data before you can analyze it or create
reports from it.
Application requirements vary, but there are common factors for all applications that
access, combine, and process data. Once you have determined what you want the output
to look like, you must perform the following tasks:
• Determine how the input data is related.
• Ensure that the data is properly sorted or indexed, if necessary.
• Select the appropriate access method to process the input data.
• Select the appropriate SAS tools to complete the task.
The Four Ways That Data Can Be Related
Data Relationship Categories
Relationships among multiple sources of input data exist when each of the sources
contains common data, either at the physical or logical level. For example, employee
data and department data could be related through an employee ID variable that shares
common values. Another data set could contain numeric sequence numbers whose
partial values logically relate it to a separate data set by observation number.
You must be able to identify the existing relationships in your data. This knowledge is
crucial for understanding how to process input data in order to produce desired results.
All related data fall into one of these four categories, characterized by how observations
relate among the data sets:
To obtain the results that you want, you should understand how each of these methods
combines observations, how each method treats duplicate values of common variables,
and how each method treats missing values or nonmatched values of common variables.
Some of the methods also require that you preprocess your data sets by sorting them or
by creating indexes. See the description of each method in “Combining SAS Data Sets:
Methods” on page 478.
In a one-to-one relationship, typically a single observation in one data set is related to a
single observation from another based on the values of one or more selected variables. A
468 Chapter 21 • Reading, Combining, and Modifying SAS Data Sets