Grouped by X Y Z

_N_=1 FIRST.x=1 LAST.x=0 FIRST.y=1 LAST.y=0 FIRST.z=1 LAST.z=0

_N_=2 FIRST.x=0 LAST.x=0 FIRST.y=0 LAST.y=1 FIRST.z=0 LAST.z=1

_N_=3 FIRST.x=0 LAST.x=1 FIRST.y=1 LAST.y=1 FIRST.z=1 LAST.z=1

_N_=4 FIRST.x=1 LAST.x=1 FIRST.y=1 LAST.y=1 FIRST.z=1 LAST.z=1

Grouped by Y X Z

_N_=1 FIRST.y=1 LAST.y=0 FIRST.x=1 LAST.x=0 FIRST.z=1 LAST.z=0

_N_=2 FIRST.y=0 LAST.y=1 FIRST.x=0 LAST.x=1 FIRST.z=0 LAST.z=1

_N_=3 FIRST.y=1 LAST.y=0 FIRST.x=1 LAST.x=1 FIRST.z=1 LAST.z=1

_N_=4 FIRST.y=0 LAST.y=1 FIRST.x=1 LAST.x=1 FIRST.z=1 LAST.z=1

Processing BY-Groups in the DATA Step

Overview

The most common use of BY-group processing is to combine data sets by using the BY

statement with the SET, MERGE, MODIFY, or UPDATE statements. (If you use a SET,

MERGE, or UPDATE statement with the BY statement, your observations must be

grouped or ordered.) When processing these statements, SAS reads one observation at a

time into the program data vector. With BY-group processing, SAS selects the

observations from the data sets according to the values of the BY variable or variables.

After processing all the observations from one BY group, SAS expects the next

observation to be from the next BY group.

The BY statement modifies the action of the SET, MERGE, MODIFY, or UPDATE

statement by controlling when the values in the program data vector are set to missing.

During BY-group processing, SAS retains the values of variables until it has copied the

last observation that it finds for that BY group in any of the data sets. Without the BY

statement, the SET statement sets variables to missing when it reads the last observation.

The MERGE statement does not set variables to missing after the DATA step starts

reading observations into the program data vector.

Processing BY-Groups Conditionally

You can process observations conditionally by using the subsetting IF or IF-THEN

statements, or the SELECT statement, with the temporary variables FIRST.variable and

LAST.variable (set up during BY-group processing). For example, you can use the IF or

IF THEN statements to perform calculations for each BY group and to write an

observation when the first or the last observation of a BY group has been read into the

program data vector.

The following example computes annual payroll by department. It uses IF-THEN

statements and the values of FIRST.variable and LAST.variable automatic variables to

reset the value of PAYROLL to 0 at the beginning of each BY group and to write an

observation after the last observation in a BY group is processed.

data salaries;

input Department $ Name $ WageCategory $ WageRate;

datalines;

BAD Carol Salaried 20000

BAD Elizabeth Salaried 5000

BAD Linda Salaried 7000

470 Chapter 22 • BY-Group Processing in the DATA Step

BAD Thomas Salaried 9000

BAD Lynne Hourly 230

DDG Jason Hourly 200

DDG Paul Salaried 4000

PPD Kevin Salaried 5500

PPD Amber Hourly 150

PPD Tina Salaried 13000

STD Helen Hourly 200

STD Jim Salaried 8000

;

proc print data=salaries;

run;

proc sort data=salaries out=temp; by Department; run;

data budget (keep=Department Payroll);

set temp;

by Department;

if WageCategory='Salaried' then YearlyWage=WageRate*12;

else if WageCategory='Hourly' then YearlyWage=WageRate*2000;

/* SAS sets FIRST.variable to 1 if this is a new */

/* department in the BY group. */

if first.Department then Payroll=0;

Payroll+YearlyWage;

/* SAS sets LAST.variable to 1 if this is the last */

/* department in the current BY group. */

if last.Department;

run;

proc print data=budget;

format Payroll dollar10.;

title 'Annual Payroll by Department';

run;

Output 22.1 Output from Conditional BY-Group Processing

Data Not in Alphabetic or Numeric Order

In BY-group processing, you can use data that is arranged in an order other than

alphabetic or numeric, such as by calendar month or by category. To do this, use the

NOTSORTED option in a BY statement when you use a SET statement. The

NOTSORTED option in the BY statement tells SAS that the data is not in alphabetic or

numeric order, but that it is arranged in groups by the values of the BY variable. You

Processing BY-Groups in the DATA Step 471

Get *SAS 9.4 Language Reference, 6th Edition* now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.