O'Reilly logo

SAS 9.4 Language Reference, 6th Edition by SAS Institute

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Indexing for BY-Group Processing
You can also ensure that observations are processed in ascending numeric or character
order by creating an index based on one or more variables in the SAS data set. If you
specify a BY statement in a DATA step, SAS looks for an appropriate index. If it finds
the index, SAS automatically retrieves the observations from the data set in indexed
order.
Note: Because creating and maintaining indexes require additional resources, you
should determine whether using them significantly improves performance.
Depending on the nature of the data in your SAS data set, using PROC SORT to
order data values can be more advantageous than indexing. For an overview of
indexes, see “Understanding SAS Indexes” on page 638.
How the DATA Step Identifies BY Groups
Processing Observations in a BY Group
In the DATA step, SAS identifies the beginning and end of each BY group by creating
two temporary variables for each BY variable: FIRST.variable and LAST.variable.
These temporary variables are available for DATA step programming but are not added
to the output data set. Their values indicate whether an observation is one of the
following positions:
the first one in a BY group
the last one in a BY group
neither the first nor the last one in a BY group
both first and last, as is the case when there is only one observation in a BY group
You can take actions conditionally, based on whether you are processing the first or the
last observation in a BY group.
Using a Name Literal as the BY-Group Variable
When you designate a name literal as the BY variable in BY-group processing and you
want to refer to the corresponding FIRST. or LAST. temporary variables, you must
include the FIRST. or LAST. portion of the two-level variable name within quotation
marks. Here is an example:
data sedanTypes;
set cars;
by 'Sedan Types'n;
if 'first.Sedan Types'n then type=1;
run;
For more information about BY-Group Processing and how SAS creates the temporary
variables, FIRST and LAST, see “How SAS Determines FIRST.variable and
LAST.variable” on page 455 and “How SAS Identifies the Beginning and End of a BY
Group” in SAS Statements: Reference.
454 Chapter 20 BY-Group Processing in the DATA Step
How SAS Determines FIRST.variable and LAST.variable
When an observation is the first in a BY group, SAS sets the value of FIRST.variable to
1 for the variable whose value changed, as well as for all of the variables that follow in
the BY statement. For all other observations in the BY group, the value of
FIRST.variable is 0. Likewise, if the observation is the last in a BY group, SAS sets the
value of LAST.variable to 1 for the variable whose value changes on the next
observation, as well as for all of the variables that follow in the BY statement. For all
other observations in the BY group, the value of LAST.variable is 0. For the last
observation in a data set, the value of all LAST.variable variables are set to 1.
Note: See “SAS Name Literals” on page 31 for more information about SAS name
literals.
Example 1: Grouping Observations by State, City, ZIP code, and
Street
This example shows how SAS uses the FIRST.variable and LAST.variable to flag the
beginning and end of four BY groups: State, City, ZipCode, and Street. Six temporary
variables are created within the program data vector. These variables can be used during
the DATA step, but they do not become variables in the new data set.
In the figure that follows, observations in the SAS data set are arranged in an order that
can be used with this BY statement:
by State City ZipCode;
SAS creates the following temporary variables: FIRST.State, LAST.State, FIRST.City,
LAST.City, FIRST.ZipCode, and LAST.ZipCode.
options pageno=1 nodate linesize=80 pagesize=60;
data testfile;
input State $ ZipCode $ City $ Street $ 19-33;
datalines;
AZ 85730 Tucson Gleeson Place
FL 33133 Miami Rice Street
FL 33133 Miami Thomas Avenue
FL 33133 Miami Surrey Drive
FL 33146 Miami Nervia Street
FL 33146 Miami Corsica Street
OH 45056 Miami Myrtle Street
;
data test2;
set testfile;
by State City ZipCode;
put _N_= state= first.state= last.state= first.city= last.city=
first.ZipCode= last.ZipCode= ;
run;
NOTE: PROCEDURE PRINTTO used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
79 options pageno=1 nodate linesize=80 pagesize=60;
80 data testfile;
81 input State $ ZipCode $ City $ Street $ 19-33;
How the DATA Step Identifies BY Groups 455

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required