O'Reilly logo

Access Data Analysis Cookbook by Wayne S. Freeze, Ken Bluttman

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Returning Distinct Records

Problem

When running select queries, you may need to control whether duplicate records are returned in the query result. However, there could be disagreement about what constitutes uniqueness and duplication. Often, a few fields may contain duplicate information among records, and it's the additional fields that bring unique values to the records. How can queries be managed with regard to controlling how duplicate information is handled?

Solution

Figure 1-19 shows a table in which there are records that are near duplicates. None are exact duplicates since the CustomerID field ensures uniqueness. However, the two records for Vickie Storm could be seen as duplicates, as all fields except the CustomerID field hold duplicate information. The records for Ebony Pickett also contain some duplicate information, although two different cities are listed.

A table with duplicates

Figure 1-19. A table with duplicates

SQL provides ways of handling how records such as these are returned or excluded when select queries are run. Access makes use of the SQL predicates Distinct and DistinctRow:

Distinct

Bearing in mind that not all fields need to be included in a select query, Distinct will exclude duplicates when the duplication occurs within just the selected fields, regardless of whether the complete set of record fields would prove the records to be unique.

DistinctRow

DistinctRow is used to manage duplicates in a query that joins tables. Assuming unique records in the parent table, DistinctRow lets you avoid having duplicates returned from the child table.

You can incorporate these predicates by using the query designer or writing them directly into the SQL statement. With a query in design mode, use the View → Properties menu option to display the Query Properties dialog box, shown in Figure 1-20. Two properties are of interest here: Unique Values and Unique Records. These can both be set to No, but only one at a time can be set to Yes.

Setting the Unique Values and Unique Records properties

Figure 1-20. Setting the Unique Values and Unique Records properties

Setting Unique Values to Yes places the DISTINCT predicate in the SQL statement. For example:

	SELECT DISTINCT Customers.FirstName,
	Customers.LastName, Customers.Address,
	Customers.City, Customers.State
	FROM Customers;

Similarly, setting the Unique Records property to Yes places the DISTINCTROW predicate just after the SELECT keyword.

Discussion

For our sample table, a simple select query of the Customers table on just the FirstName and LastName fields would return nine records, without regard to the fact that the returned results would show two records for Ebony Pickett and two records for Vickie Storm. Using Distinct in the SQL statement will change the returned count to seven records. In particular, this SQL statement:

	Select Distinct FirstName, LastName From Customers
	Order By LastName

produces the result shown in Figure 1-21.

Distinct records are returned

Figure 1-21. Distinct records are returned

When the City and State fields are added to the SQL statement, like this:

	Select Distinct FirstName, LastName, City, State
	From Customers
	Order By LastName

eight records are returned (see Figure 1-22). The additional record appears because Ebony Pickett is listed in two unique cities. As far as the query goes, there are now two unique Ebony Pickett records, and they are both returned. Vickie Storm still has just one record returned, however, because the source data for her city and state are identical in both of her records.

Using DistinctRow

Now, let's take a closer look at using DistinctRow, which manages duplicates in multitable joins. Figure 1-23 shows two tables: a Customers table (this table does not contain any duplicates) and a table of purchases related back to the customers.

Distinct records are returned based on additional fields

Figure 1-22. Distinct records are returned based on additional fields

Customers and Purchases tables

Figure 1-23. Customers and Purchases tables

Say you want to find out which customers have placed orders. A SQL statement that joins the tables but does not use DistinctRow will return a row count equivalent to the number of records in the child (Purchases)table. Here is a simple SQL statement that returns the names of the customers who placed each of the orders:

	SELECT Customers.CustomerID, Customers.FirstName,
	Customers.LastName
	FROM Customers INNER JOIN Purchases ON
	Customers.CustomerID = Purchases.CustomerID;

The result of running this query is shown in Figure 1-24. No fields from the Purchases table have been included, but the effect of the multiple child records is seen in the output—a customer name is listed for each purchase.

The simple query returns duplicate master records

Figure 1-24. The simple query returns duplicate master records

Adding the DistinctRow predicate ensures that the returned master records are free of duplicates:

	SELECT DistinctRow Customers.CustomerID,
	Customers.FirstName, Customers.LastName
	FROM Customers INNER JOIN Purchases ON
	Customers.CustomerID = Purchases.CustomerID;

The result is shown in Figure 1-25.

Using DistinctRow avoids duplicate records

Figure 1-25. Using DistinctRow avoids duplicate records

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required