Determining Whether the Data Requires
Preprocessing for BY-Group Processing
Before you process one or more SAS data sets using grouped or ordered data with the
SET, MERGE, or UPDATE statements, you must check the data to determine whether
they require preprocessing. They require no preprocessing if the observations in all of
the data sets occur in one of the following patterns:
• ascending or descending numeric order
• ascending or descending character order
• not alphabetically or numerically ordered, but grouped in some way, such as by
calendar month or by a formatted value
If the observations are not in the order that you want, you must either sort the data set or
create an index for it before using BY-group processing.
If you use the MODIFY statement in BY-group processing, you do not need to presort
the input data. Presorting, however, can make processing more efficient and less costly.
You can use PROC SQL views in BY-group processing. For complete information, see
SAS SQL Procedure User's Guide.
Note: SAS/ACCESS Users: If you use SAS views or librefs, see SAS/ACCESS for
Relational Databases: Reference for information about using BY groups in your
Preprocessing Input Data for BY-Group
Sorting Observations for BY-Group Processing
You can use the SORT procedure to change the physical order of the observations in the
data set. You can either replace the original data set, or create a new, sorted data set by
using the OUT= option of the SORT procedure. In this example, PROC SORT
rearranges the observations in the data set INFORMATION based on ascending values
of the variables State and ZipCode, and replaces the original data set.
proc sort data=information;
by State ZipCode;
As a general rule, when you use PROC SORT, specify the variables in the BY statement
in the same order that you plan to specify them in the BY statement in the DATA step.
For a detailed description of the default sorting orders for numeric and character
variables, see the SORT procedure in Base SAS Procedures Guide.
Note: The BY statement honors the linguistic collation of sorted data when you use the
SORT procedure with the SORTSEQ=LINGUISTIC option.
Preprocessing Input Data for BY-Group Processing 431