O'Reilly logo

Step-by-Step Programming with Base SAS 9.4, Second Edition, 2nd Edition by SAS Institute

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

The following output displays the results.
Figure 12.7 Selecting One Observation from Each BY Group
Working with Sorted Data
Understanding Sorted Data
By default, groups appear in ascending order of the BY values. In some cases you want
to emphasize the order in which the observations are sorted, not the fact that they can be
grouped. For example, you might want to alphabetize the tours by country.
To sort your data in a particular order, use the SORT procedure just as you do for
grouped data. When the sorted order is more important than the grouping, you usually
want only one observation with a given BY value in the resulting data set. Therefore,
you might need to remove duplicate observations.
Operating Environment Information
The SORT procedure accesses either a sorting utility that is supplied as part of SAS,
or a sorting utility that is supplied by the host operating environment. All examples
in this documentation use the SAS sorting utility. Some operating environment
utilities do not accept particular options, including the NODUPRECS option
described later in this section. The default sorting utility is set by your site. For more
information about the utilities available to you, see the documentation for your
operating environment.
Sorting Data
The following example sorts data set MYLIB.ARCH_OR_SCEN by Country:
proc sort data=mylib.arch_or_scen out=bycountry;
by Country;
run;
proc print data=bycountry;
title 'Tours in Alphabetical Order by Country';
run;
192 Chapter 12 Working with Grouped or Sorted Observations
The following output displays the results.
Figure 12.8 Sorting Data
Deleting Duplicate Observations
You can eliminate duplicate observations in a SAS data set by using the NODUPRECS
option with the SORT procedure. The following programs show you how to create a
SAS data set and then remove duplicate observations.
The external file shown below contains a duplicate observation for Switzerland:
Spain architecture 10 1020 World
Japan architecture 8 1440 Express
Switzerland scenery 9 1468 World
Brazil architecture 8 1150 World
Switzerland scenery 9 1468 World
Ireland scenery 7 1116 Express
New Zealand scenery 16 2978 Southsea
Italy architecture 8 936 Express
Greece scenery 12 1396 Express
The following DATA step creates a permanent SAS data set named
MYLIB.ARCH_OR_SCEN2.
libname mylib 'SAS-library';
data mylib.arch_or_scen2;
infile 'input-file';
input Country $ 1–11 TourType $ 13–24 Nights LandCost Vendor $;
run;
proc print data=mylib.arch_or_scen2;
title 'Data Set MYLIB.ARCH_OR_SCEN2';
run;
Working with Sorted Data 193
The following output shows that this data set contains a duplicate observation for
Switzerland.
Figure 12.9 Data Set MYLIB.ARCH_OR_SCEN2
The following program uses the NODUPRECS option in the SORT procedure to delete
duplicate observations. The program creates a new data set called FIXED:
proc sort data=mylib.arch_or_scen2 out=fixed noduprecs;
by Country;
run;
proc print data=fixed;
title 'Data Set FIXED: MYLIB.ARCH_OR_SCEN2 With Duplicates Removed';
run;
194 Chapter 12 Working with Grouped or Sorted Observations
The following output displays messages that appear in the SAS log.
Log 12.2 SAS Log Indicating Deleted Duplicate Observations
697 proc sort data=mylib.arch_or_scen2 out=fixed noduprecs;
698 by Country;
699 run;
NOTE: There were 9 observations read from the data set MYLIB.ARCH_OR_SCEN2.
NOTE: 1 duplicate observations were deleted.
NOTE: The data set WORK.FIXED has 8 observations and 5 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
700
701 proc print data=fixed;
702 title 'Data Set FIXED: MYLIB.ARCH_OR_SCEN2 With Duplicates Removed';
703 run;
NOTE: There were 8 observations read from the data set WORK.FIXED.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
The following output shows the results of the NODUPRECS option.
Figure 12.10 Data Set FIXED with No Duplicate Observations
Understanding Collating Sequences
Both numeric and character variables can be sorted into ascending or descending order.
For numeric variables, ascending or descending order is easy to understand. For
character variables, ascending or descending order is more complex. Character values
include uppercase and lowercase letters, special characters, and the digits 0 through 9
when they are treated as characters rather than as numbers.
Working with Sorted Data 195

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required