Grouping Data
The GROUP BY clause groups data by a specified column or columns. When you use a
GROUP BY clause, you also use an aggregate function in the SELECT clause or in a
HAVING clause to instruct PROC SQL in how to summarize the data for each group.
PROC SQL calculates the aggregate function separately for each group.
Grouping by One Column
The following example sums the populations of all countries to find the total population
of each continent:
libname sql 'SAS-library';
proc sql;
title 'Total Populations of World Continents';
select Continent, sum(Population) format=comma14. as TotalPopulation
from sql.countries
where Continent is not missing
group by Continent;
Note: Countries for which a continent is not listed are excluded by the WHERE clause.
Output 2.41 Grouping by One Column
Grouping without Summarizing
When you use a GROUP BY clause without an aggregate function, PROC SQL treats
the GROUP BY clause as if it were an ORDER BY clause and displays a message in the
log that informs you that this has happened. The following example attempts to group
high and low temperature information for each city in the Sql.WorldTemps table by
country:
libname sql 'SAS-library';
64 Chapter 2 Retrieving Data from a Single Table
proc sql outobs=12;
title 'High and Low Temperatures';
select City, Country, AvgHigh, AvgLow
from sql.worldtemps
group by Country;
The output and log show that PROC SQL transforms the GROUP BY clause into an
ORDER BY clause.
Output 2.42 Grouping without Aggregate Functions
Log 2.2 Grouping without Aggregate Functions (Partial Log)
WARNING: A GROUP BY clause has been transformed into an ORDER BY clause because
neither the SELECT clause nor the optional HAVING clause of the
associated table-expression referenced a summary function.
Grouping by Multiple Columns
To group by multiple columns, separate the column names with commas within the
GROUP BY clause. You can use aggregate functions with any of the columns that you
select. The following example groups by both Location and Type, producing total square
miles for the deserts and lakes in each location in the Sql.Features table:
libname sql 'SAS-library';
proc sql;
Grouping Data 65
title 'Total Square Miles of Deserts and Lakes';
select Location, Type, sum(Area) as TotalArea format=comma16.
from sql.features
where type in ('Desert', 'Lake')
group by Location, Type;
Output 2.43 Grouping by Multiple Columns
Grouping and Sorting Data
You can order grouped results with an ORDER BY clause. The following example takes
the previous example and adds an ORDER BY clause to change the order of the
Location column from ascending order to descending order:
libname sql 'SAS-library';
proc sql;
title 'Total Square Miles of Deserts and Lakes';
select Location, Type, sum(Area) as TotalArea format=comma16.
from sql.features
where type in ('Desert', 'Lake')
group by Location, Type
order by Location desc;
66 Chapter 2 Retrieving Data from a Single Table
Output 2.44 Grouping with an ORDER BY Clause
Grouping with Missing Values
Finding Grouping Errors Caused by Missing Values
When a column contains missing values, PROC SQL treats the missing values as a
single group. This can sometimes provide unexpected results.
In this example, because the Sql.Countries table contains some missing values in the
Continent column, the missing values combine to form a single group that has the total
area of the countries that have a missing value in the Continent column:
libname sql 'SAS-library';
/* unexpected output */
proc sql outobs=12;
title 'Areas of World Continents';
select Name format=$25.,
Continent,
sum(Area) format=comma12. as TotalArea
from sql.countries
group by Continent
order by Continent, Name;
The output is incorrect because Bermuda, Iceland, and Kalaallit Nunaat are not actually
part of the same continent. However, PROC SQL treats them that way because they all
have a missing character value in the Continent column.
Grouping Data 67
Output 2.45 Finding Grouping Errors Caused by Missing Values (Unexpected Output)
To correct the query from the previous example, you can write a WHERE clause to
exclude the missing values from the results:
/* modified output */
proc sql outobs=12;
title 'Areas of World Continents';
select Name format=$25.,
Continent,
sum(Area) format=comma12. as TotalArea
from sql.countries
where Continent is not missing
group by Continent
order by Continent, Name;
68 Chapter 2 Retrieving Data from a Single Table

Get SAS 9.4 SQL Procedure User's Guide, Third Edition, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.