Obtaining a Total for Each BY Group
An additional requirement of Tradewinds Travel is to determine the number of tours that
are booked with each vendor. In order to accomplish this task, a program must group the
data by a variable. That is, the program must organize the data set into groups of
observations, with one group for each vendor. In this case, the program must group the
data by the Vendor variable. Each group is known generically as a BY- group. The
variable that is used to determine the groupings is called a BY variable.
In order to group the data by the Vendor variable:
include a PROC SORT step to group the observations by the Vendor variable
use a BY statement in the DATA step
use a Sum statement to total the bookings
reset the Sum variable to 0 at the beginning of each group of observations
The following program sorts the data set by Vendor and sums the total bookings for each
vendor.
proc sort data=mylib.tourrevenue out=mylib.sorttour;
by Vendor;
run;
data totalby;
set mylib.sorttour;
by Vendor;
if First.Vendor then VendorBookings=0;
VendorBookings + NumberOfBookings;
run;
proc print data=totalby;
title 'Summary of Bookings by Vendor';
run;
In the preceding program, the FIRST.Vendor variable is used in an IF-THEN statement
to set the Sum variable (VendorBookings) to 0 in the first observation of each BY group.
(For more information about the FIRST.variable and LAST.variable temporary
variables, see “Finding the First or Last Observation in a Group” on page 189.)
204 Chapter 13 Using More Than One Observation in a Calculation
The following output displays the results.
Figure 13.4 Creating Totals for BY Groups
Notice that while this output does in fact include the total number of bookings for each
vendor, it also includes a great deal of extraneous information. Reporting the total
bookings for each vendor requires only the variables Vendor and VendorBookings from
the last observation for each vendor. Therefore, the program can use the following
elements:
the DROP= or KEEP= data set options to eliminate the variables Country, LandCost,
and NumberOfBookings from the output data set
the LAST.Vendor variable in a subsetting IF statement to write only the last
observation in each group to the data set TOTALBY
The following program creates data set TOTALBY:
proc sort data=mylib.tourrevenue out=mylib.sorttour;
by Vendor;
run;
data totalby(drop=country landcost NumberOfBookings);
set mylib.sorttour;
by Vendor;
if First.Vendor then VendorBookings=0;
VendorBookings + NumberOfBookings;
if Last.Vendor;
run;
proc print data=totalby;
Obtaining a Total for Each BY Group 205

Get Step-by-Step Programming with Base SAS 9.4, Second Edition, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.