BUY THIS BOOK
Add to Cart

Print Book $49.99


Add to Cart

PDF $39.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £35.50

What is this?

Looking to Reprint or License this content?


Access Data Analysis Cookbook
Access Data Analysis Cookbook

By Ken Bluttman, Wayne S. Freeze
Book Price: $49.99 USD
£35.50 GBP
PDF Price: $39.99

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Query Construction
Select queries are an essential part of any database system. These queries, which passively gather data (without changing the source data), are what we rely on to answer our questions about our data. In its most basic form, a select query merely returns records from a table verbatim. That's not of much interest, since viewing the table itself would provide the same information. It's when you add criteria, joins, and useful SQL functions and methods that select queries become valuable.
This chapter provides several tips for getting select queries to go the extra mile. Recipes in this chapter explain how to prompt for criteria at runtime, how to use logic operators to get the criteria just the way you need them, and how to handle duplicate records.
To make queries easier to read and work with, you'll also find a recipe on using aliases, which provides a neat method to give nicknames to your tables. Another recipe explains how to use union queries to work around the problem of how to combine data from different tables so it can be treated as one source.
I have a table that lists expenses incurred by employees. Some of these records do not match any records in the Employees table. How can I easily get a list of these unmatched expense records without having to examine every record in the table?
A special type of join called a left join (see ) is used to identify records in one table that do not have matches within another table. The match, of course, has to be tested on a common field between tables—usually the unique key field of the parent table. The technique depends on having the criterion call for the matching field to be Null in the parent table. In other words, the query should return records from the child table in which no record (a Null) is found in the parent table.
Confused? Luckily, you can spare yourself the challenge of creating that query by using the Find Unmatched Query Wizard. The wizard will create the underlying SQL and run the query for you.
shows two tables: one lists employees, and the other lists expenses for which employees need to be reimbursed.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Finding Unmatched Records
I have a table that lists expenses incurred by employees. Some of these records do not match any records in the Employees table. How can I easily get a list of these unmatched expense records without having to examine every record in the table?
A special type of join called a left join (see ) is used to identify records in one table that do not have matches within another table. The match, of course, has to be tested on a common field between tables—usually the unique key field of the parent table. The technique depends on having the criterion call for the matching field to be Null in the parent table. In other words, the query should return records from the child table in which no record (a Null) is found in the parent table.
Confused? Luckily, you can spare yourself the challenge of creating that query by using the Find Unmatched Query Wizard. The wizard will create the underlying SQL and run the query for you.
shows two tables: one lists employees, and the other lists expenses for which employees need to be reimbursed.
Figure 1-1: Employees and EmployeeReimbursements tables
A number of records in the Employee Reimbursements table are "orphan" records— that is, they do not match any employee records in the table on the left (the parent table). The Find Unmatched Query Wizard will identify these records for you. From the Query tab in the Access database window, click the New button, or use the Insert → Query menu option to display the New Query dialog box shown in . Select Find Unmatched Query Wizard, and click the OK button.
The wizard runs through a few screens. You'll need to:
  1. Select the table or query that contains the records you want to identify. In this example, the EmployeeReimbursements table contains the records of interest (that is, the records that have no matches to the employee records themselves).
  2. Select the table that contains the records to match against.
  3. From each table, select the field to match on. Often this is the key field in one table and a foreign key in the other table.
  4. Select which fields from the table or query chosen in the first step should be included in the returned records.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Making AND and OR Do What You Expect
Logic operators are not conceptually difficult to follow, but combining and nesting them does add complexity. If you don't construct complex SQL statements very carefully, they may return incorrect or incomplete results, sometimes without reporting any errors.
Logic operators provide the flexibility to construct criteria in any way that suits your requirements. The AND operator returns true when all conditions are met; the OR operator returns true as long as one condition is met. In terms of how this applies to SQL construction, OR is used to set criteria for which one condition must be met, while AND is used to set criteria for which all the conditions must be met. Some examples are presented in .
Table 1-1: Examples of using logic operators
SQL statement
Description
SELECT DISTINCT State, City,
Count(LastName) AS Customers
FROM tblCustomers
GROUP BY State, City
HAVING State="NY" AND City="Yonkers"
This gives a count of customers located in Yonkers, NY. Only customer records in which both the state is New York and the city is Yonkers are counted.
SELECT DISTINCT State, City,
Count(LastName) AS Customers
FROM tblCustomers
GROUP BY State, City
HAVING State="NY"AND City="Yonkers" OR
City="Albany"
This gives a count of customer records for which the state is New York and the city is either Yonkers or Albany.
This produces an unintended result. The OR statement does not properly apply to both Yonkers and Albany. Any Yonkers customers must be in New York, but the way this SQL statement is constructed, Albany customers do not have to be in New York. Consequently, as shows, customers in Albany, GA will also be returned.
SELECT DISTINCT State, City,
Count(LastName) AS Customers
FROM tblCustomers
GROUP BY State, City
HAVING State="NY" AND (City="Yonkers" OR
City="Albany")
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Working with Criteria Using the IN Operator
Using multiple OR operators in the query grid makes for an unmanageable experience. If too many values and ORs are placed in a grid column, the column may expand to be bigger than the viewable area.
A way to save space in the query grid is to use the IN operator. IN is used in conjunction with a list of values from which any value can be returned. This essentially means that the IN operator works in the same fashion as the OR operator. It is not required that all conditions be met; meeting one of the conditions suffices.
Here is a SQL statement that returns records for students that took at least one of the listed courses:
	SELECT Students.Student, Student_Grades.Course,
	Student_Grades.Instructor
	FROM Students INNER JOIN Student_Grades ON
	Students.StudentID = Student_Grades.StudentID
	WHERE
	(((Student_Grades.Course)="Beginner Access"))
	OR
	(((Student_Grades.Course)="Beginner Excel"))
	OR
	(((Student_Grades.Course)="Advanced Access"))
	OR
	(((Student_Grades.Course)="Advanced Excel"));
Using IN provides a more streamlined SQL statement. Notice how the WHERE section has shrunk:
	SELECT Students.Student, Student_Grades.Course,
	Student_Grades.Instructor
	FROM Students INNER JOIN Student_Grades ON
	Students.StudentID = Student_Grades.StudentID
	WHERE Student_Grades.Course In
	("Beginner Access","Beginner Excel",
	"Advanced Access","Advanced Excel");
The IN operator provides a syntax convenience. It makes it easier to eyeball a set of criteria values to which OR logic is applied. shows an example of using IN to return records where the instructor is either Brown or Maxwell.
Figure 1-7: Using the IN operator to specify the instructor
That's simple enough to follow: when the instructor is either Brown or Maxwell, the record is returned. shows an example of using IN in two fields.
The example shown in returns records in which either Brown or Maxwell taught Beginner Access, Advanced Access, or Intro to VBA. In other words, all combinations of these instructors and courses are returned.
Figure 1-8: Using the IN operator for both the Instructor and Course fields
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Excluding Records with the NOT Operator
I have a large number of client names in my data. I need to return a list of clients that are not on the Hold list. Most clients are OK, so most will be returned in the query. How do I keep out the few clients who are on hold?
The method here is to exclude records from being returned, rather than the typical approach of identifying records that are to be returned. shows two data-base tables. The table on the left is a list of client orders. The table on the right is a list of clients (by ClientID) who are "on hold"—that is, clients whose accounts are in arrears and whose orders should not be shipped. Running a query that causes the clients identified in the OnHold table to be excluded from the Clients table is the key to this recipe.
A subquery works well here to gather the records from the second table into the query result. Using the NOT operator provides the twist to make the records excluded instead of included.
The NOT operator is placed in front of the subquery to reverse the logic. If NOT were left out, the query would return records that match in both tables. When NOT is applied, only those records from the Clients table that do not have matching records in the OnHold table are returned. Here is the SQL statement:
	SELECT Clients.ClientID, Clients.Client,
	Clients.OrderDate, Clients.OrderAmount
	FROM Clients
	WHERE (((Clients.ClientID)
	NOT In (Select ClientID from OnHold)));
Figure 1-9: A table of clients and a table of clients on hold
NOT is a logic operator that reverses a Boolean state, so NOT true equals false, and NOT false equals true. When a query calls for matching criteria, preceding the criteria construct with NOT flips this around and calls for records that specifically do not match the criteria.
Our sample Clients table has 200 records, and the OnHold table has 8 records. The result is that the query returns 192 records—that is, all orders for clients who are not on hold.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Parameterizing a Query
I need to construct a query that takes a criterion, but the criterion's value will not be known until the query is run. When it's time to run the query, the user needs a way to supply the criterion without going into the design of the query.
A query can be designed to accept parameters at the time it is run. Typically, an input box will appear in which the user enters the value for the criterion. A query can have any number of criteria entered in this fashion. A set of brackets defines the question asked in the input box. The brackets and the prompt to the user are placed in the Criteria row of the query grid for the given field. For example, using "[Enter an age]" as the criterion for a field instructs Access to present this prompt in a dialog box, as shown in .
Figure 1-10: Prompting the user to enter a parameter into a query
When a query is run, a traditional form is often displayed to enable users to enter parameter values or make selections from a list. But the ability to place parameters directly in the structure of a query provides a great alternative to having to build a form that gathers input. When the criteria are simple, just using brackets in the query design will suffice.
shows the query design that prompts the user to enter an age. When the query is run, the dialog shown in will appear, and the returned records will be filtered to those that match the entered value.
Figure 1-11: The design of the query with the age parameter
Here is the actual SQL statement that is built using the query grid:
	SELECT Name_City_Age.ID, Name_City_Age.FirstName,
	Name_City_Age.LastName, Name_City_Age.City,
	Name_City_Age.Age
	FROM Name_City_Age
	WHERE (((Name_City_Age.Age)=[Enter an age]));
Note that in the WHERE clause the phrase "Enter an age" appears enclosed in brackets.
Although the phrase "Enter an age" is used here to define the criterion for a field named Age, there is no strict requirement to use the word "age" in the bracketed phrase. We could just as well have used "Enter a number"; it wouldn't matter because the text in the brackets does not have to contain the name of the field for which it is used.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Returning a Top or Bottom Number of Records
I have a large table of data that contains thousands of records and several dozen fields. I create models based on various fields and/or ranges of values in the fields. I use queries to set up the sums using SQL aggregates and expressions. This is exactly what I need, but the problem is that the number of records slows down the processing. When I'm testing calculations, I don't need all the records. How can I pull out just a handful of them to use for testing?
The SQL TOP predicate is just what is called for here. It lets you specify how many records to return, either as an exact number or as a percentage of the total number of records in the underlying table or query.
Let's say you have a standard select query such as the one shown in . The SQL statement is:
	SELECT SampleNum, Identifier, Fact1, Fact2,
	Fact3, Fact4, Fact5, Fact6, Fact7, Fact8
	FROM ConsumerTrendData;
Figure 1-16: A simple select query returns all records
To specify a subset of records to search through to test the query—say, 40—use the TOP predicate, as follows:
	SELECT TOP 40 SampleNum, Identifier, Fact1,
	Fact2, Fact3, Fact4, Fact5, Fact6,
	Fact7, Fact8
	FROM ConsumerTrendData;
TOP comes directly after the SELECT keyword, and is followed by the number of records to return. Instead of reducing the number of returned records based on criteria, TOP reduces the number of returned records without any bias.
When working with the Access query grid, you can opt to use TOP by going into the query properties. To do this, use the View → Properties menu option while designing the query. The properties sheet that opens may display the properties for a field. If this is the case, click on the top pane of the query designer (above the grid)but not on any tables—in other words, click on the empty area. This will ensure that the properties sheet displays the query properties (see ).
Figure 1-17: The Query Properties sheet
One of the properties is Top Values. In , you can see that the value of 40 is already entered.
To return a percentage of the records, you can place a percent sign
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Returning Distinct Records
When running select queries, you may need to control whether duplicate records are returned in the query result. However, there could be disagreement about what constitutes uniqueness and duplication. Often, a few fields may contain duplicate information among records, and it's the additional fields that bring unique values to the records. How can queries be managed with regard to controlling how duplicate information is handled?
shows a table in which there are records that are near duplicates. None are exact duplicates since the CustomerID field ensures uniqueness. However, the two records for Vickie Storm could be seen as duplicates, as all fields except the CustomerID field hold duplicate information. The records for Ebony Pickett also contain some duplicate information, although two different cities are listed.
Figure 1-19: A table with duplicates
SQL provides ways of handling how records such as these are returned or excluded when select queries are run. Access makes use of the SQL predicates Distinct and DistinctRow:
Distinct
Bearing in mind that not all fields need to be included in a select query, Distinct will exclude duplicates when the duplication occurs within just the selected fields, regardless of whether the complete set of record fields would prove the records to be unique.
DistinctRow
DistinctRow is used to manage duplicates in a query that joins tables. Assuming unique records in the parent table, DistinctRow lets you avoid having duplicates returned from the child table.
You can incorporate these predicates by using the query designer or writing them directly into the SQL statement. With a query in design mode, use the View → Properties menu option to display the Query Properties dialog box, shown in . Two properties are of interest here: Unique Values and Unique Records. These can both be set to No, but only one at a time can be set to Yes.
Figure 1-20: Setting the Unique Values and Unique Records properties
Setting Unique Values to Yes places the DISTINCT
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Returning Random Records
For efficient analysis work, I need to pull random records out of my source table. Each time I run a query, I'd like to have the records returned in an unknown order.
The technique to apply here is to sort the records on a random value using the Rnd function. shows a table with three fields. To return the records in a random order, pass the name of one of the fields as the argument to the Rnd function in the ORDER BY clause of the SQL statement.
For example, using the Temperature field, the SQL statement necessary to return the records in random order is:
	SELECT Samples.Location, Samples.Temperature, Samples.Date
	FROM Samples
	ORDER BY Rnd(Samples.Temperature);
shows the result of running the query. Bear in mind that each time the query is run, the records will be returned in a different order.
Using the Rnd function on one field while performing an ascending or descending sort on another field provides an interesting, sometimes useful result. For example, this SQL statement performs sorts on two fields (one ascending and one random):
	SELECT Samples.Location, Samples.Temperature,
	Samples.Date
	FROM Samples
	ORDER BY Samples.Location, Rnd(Samples.Temperature);
Figure 1-26: A table from which random records are required
Figure 1-27: Queried records are returned in a random order
shows the result of running this query. An ascending sort is done on the Location field, so Facility A records float to the top. However, the temperatures are sorted randomly. Thus, each time this query is run, all the Facility A records will be on top, but the Facility A records will be randomly sorted based on the way the Temperature field is handled.
Figure 1-28: One field is sorted in ascending order and another is randomly sorted
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Fine-Tuning Data Filtering with Subqueries
I need to determine which records in a table have above-average values for a particular quantitative field. How can I calculate the average and filter the records in one query?
The AVG aggregate function calculates the average value of a field across the records included in a query. While that is a straightforward operation, comparing the value in each record to the average presents a challenge. One way to do this is to use a subquery. A subquery is literally a query within a query, typically located within the WHERE section of the main query.
shows a table of teams and their scores for the season. The task is to identify which teams have a season score that is greater than the average of all the scores.
Figure 1-29: A table of teams and scores
A little finesse with SQL is required to identify the teams that beat the average. The AVG aggregate function (see )is needed, but it is not applied in the typical way in the query grid. unmatched_ shows how the query is entered into the query grid. Select View → Totals while designing the query to display the Total row in the grid. Then, create an additional column in the grid based on the SeasonScore field. Don't select AVG in the Total row; instead, select Where from the drop-down list, and enter the subquery in the Criteria row.
Figure 1-30: A subquery design in the query grid
In this example, the greater-than sign (>) precedes the subquery, since we are looking for scores that are greater than the average. The AVG function appears in the subquery itself, which has the following syntax:
	Select AVG(SeasonScore) From SeasonScores
Here's the full SQL statement:
	SELECT Team, SeasonScore
	FROM SeasonScores
	WHERE (((SeasonScore)>
	(Select AVG(SeasonScore) From SeasonScores)))
	GROUP BY Team, SeasonScore
	ORDER BY Team;
In the preceding example, the subquery resides in the WHERE section of the outer query. An alternative is to have the subquery act as one of the fields in the outer query's SELECT section. shows two tables and a query. On the left is the SeasonScores table presented earlier. On the right is a related child table that lists game dates and locations for each team in the first table. The query, whose result is shown underneath the tables, has returned the date of the last game played by each team.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Combining Data with Union Queries
I need to combine sets of data so I can run analyses on them. The sets of data are identical but sit in different tables. There's no way to combine the data in the query grid. I could use append queries to copy data from the various tables to a master table, but this is inefficient. The data in the smaller tables changes from time to time, and having to rerun the appends is a nuisance. Isn't there a way to simply combine the data at any time as needed, so the latest data in the smaller tables is always present?
A union query is the perfect vehicle for combining identically structured data. To create a union query, place Union SQL clauses between the Select statements that query the tables.
shows three tables with an identical structure. Let's take a look at how to combine the data from these three tables.
Union queries must be written in the SQL pane of the query designer. It is not possible to represent them in the query grid. Here's a SQL statement written in the SQL pane of the query designer:
	SELECT * From SeasonScores_Putnam
	Union
	SELECT * From SeasonScores_Rockland
	Union
	SELECT * From SeasonScores_Westchester;
Figure 1-35: Three tables with identically structured data
Running the query returns a single set of data, shown in .
Figure 1-36: The result of running a union query
All the records from the three tables are now together in one place. This query can be saved and then used as the source for other queries and further analysis. For example, this saved query can be used in a query that calculates an average or some other summarization. If and when any data changes back in the source tables, the new data will flow through to the output of this union query, as each time it is rerun, it uses the latest data from the source tables.
A hard-and-fast rule is that all the selects feeding into a union query must have the same number of fields. In the previous example, this was a given because the three source tables were identical in structure. However, imagine assembling a list of names from various data tables, such as a Contacts table, a Prospects table, and a HolidayList table.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Inserting On-the-Fly Fields in Select Queries
I need to include additional information in a query's output. The information is sometimes based on the fields in the query, but at other times, it just needs to be inserted as a fixed message. How can I do this?
In the Field row of the Access query design grid, you can enter a name that will appear in the output as a field name, as any standard field name would. Follow this with a colon (:)and the value that will go into the new field, and you have created a new output field that exists only during the run of the query. This field is not saved back into any source tables. The value that goes into the new field can be dependent on other fields in the query, or it can be completely independent.
shows a table of clients and balances due.
Figure 1-39: A table of clients and balances due
shows a query based on the table. In the query are two created fields that do not actually exist in the table. The first is named Client. The value for the Client field comes from a concatenation of the FirstName and LastName table fields.
Another new field—Message—provides a fixed string when the query is run. The Message field is populated with an expression that has nothing to do with any table fields.
Figure 1-40: A query with expression-based fields
As shown in , the query result contains two fields that list the clients' full names and the fixed message.
Figure 1-41: The result of running the query
Using expression-based fields in queries provides ways to treat records with some intelligence. A useful example involves using an expression to return a message for certain records based on the value of a table field. shows how the IIf function is incorporated into our derived Message field. Now, the message about the Spring Special will appear only in records that have a balance of 100 or less.
Figure 1-42: Using a condition in a field expression
Here is the SQL statement for the query in :
	SELECT [FirstName] & " " & [LastName] AS Client,
	IIf([Balance]<=100,"Don't Miss Our Spring Special!","")
	AS MessageFROM tblClients;
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Using Aliases to Simplify Your SQL Statements
Table names precede field names in SQL statements, so queries that use multiple fields and tables wind up being very long. Is there a way to use shortcut identifiers instead for the table names?
Yes, there is! In a SQL statement, any table name can be given an alias. The place to do this is after the FROM keyword, where the table name is entered. Follow the table name with an alias of your choosing (make sure it is not a reserved word, an existing field name, etc.). Then, use the alias instead of the table name in the other areas of the SQL statement. Let's look at an example.
Using the Access query grid to assemble the query results in this SQL statement that addresses a single table:
	SELECT tblCustomers.CustomerCompanyName,
	tblCustomers.CustomerFirstName,
	tblCustomers.CustomerLastName,
	tblCustomers.CustomerAddr1,
	tblCustomers.CustomerAddr2,
	tblCustomers.CustomerCity,
	tblCustomers.CustomerState,
	tblCustomers.CustomerZip,
	tblCustomers.CustomerHomePhone,
	tblCustomers.CustomerWorkPhone
	FROM tblCustomers;
Here is the same query, this time using the alias C for tblCustomers. The alias is placed just after the table name in the FROM section, and all references to the table name in the rest of the query just use the alias:
	SELECT C.CustomerCompanyName,
	C.CustomerFirstName, C.CustomerLastName,
	C.CustomerAddr1, C.CustomerAddr2, C.CustomerCity,
	C.CustomerState, C.CustomerZip, C.CustomerHomePhone,
	C.CustomerWorkPhone
	FROM tblCustomers C;
This SQL statement is much shorter and easier to follow.
Aliases are also useful—perhaps even more so—with queries that address multiple tables. Here is the SQL statement of a query that addresses three tables (tblCustomers, tblInvoices, and tblInvoicePayments). Inner joins connect the tables on key fields:
	SELECT tblCustomers.CustomerCompanyName,
	tblCustomers.CustomerFirstName,
	tblCustomers.CustomerLastName,
	tblCustomers.CustomerAddr1,
	tblCustomers.CustomerAddr2,
	tblCustomers.CustomerCity,
	tblCustomers.CustomerState,
	tblCustomers.CustomerZip,
	tblInvoices.InvoiceNumber,
	tblInvoices.InvoiceDate,
	tblInvoices.Status,
	tblInvoices.Hours, tblInvoices.Rate,
	tblInvoicePayments.PaymentAmount,
	tblInvoicePayments.PaymentDate,
	tblInvoicePayments.PaymentType
	FROM (tblCustomers INNER JOIN tblInvoices ON
	tblCustomers.CustomerID = tblInvoices.CustomerID)
	INNER JOIN tblInvoicePayments ON
	tblInvoices.InvoiceID = tblInvoicePayments.InvoiceID;
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Creating a Left Join
I have a table of students and a table of courses they have taken. Not every student has taken a course. I want a listing of all the students and any courses they have taken, including students who have not yet taken a course. However, when I run the query normally, I only get back records of students who have taken at least one course.
shows the standard query you would use to query from two tables. This query will return all records from Students who have related records in Courses Taken. If, for a given student, there is more than one record in Courses Taken, the number of related records in Courses Taken is the number of records that will be returned for that given student. But students with no matched courses are left out of the returned records altogether.
shows an inner join. To ensure that all records from the master table (Students) appear in the results, the query must be changed to a left join. This is easy to do when the query is in design mode: either use the View → Join Properties menu option, or double-click on the line that connects the table to display the Join Properties dialog box, shown in .
In the Join Properties dialog box are three numbered options. The first one is the standard inner join. The second one creates a left join. The third option creates a right join (see ). Select the second option and click OK. Now, when the query is run, all records from the Students table will appear, as shown in .
Figure 1-43: An inner join query returns only matched records
Figure 1-44: Setting the properties for a left join
A left join returns all records from the master table, and probably all records from the child table (here, Courses Taken). This last fact depends on whether referential integrity exists between the tables. In other words, if referential integrity is enforced, each record in the Courses Taken table must match to a record in the Students table. Then, even though there are student records with no matching courses, all course records must belong to students and hence are returned in the query.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Creating a Right Join
I have a parent table and a child table. The parent table contains customers, and the child table contains purchases, but some records in the Purchases table do not belong to any customer. I need to run a query that returns all the records from the Purchases table, even if there is no matching customer. At the very least, this will help me identify purchases that are not being billed to anyone.
The request here is for a right join. In a right join, all records are returned from the child table, including those that have no match in the parent table. For any such records to exist in the child table, referential integrity must not exist between the tables. The presence of orphan records is possible only when such records can exist outside the confines of referential integrity with the parent table.
shows how a right join is created: use the third option in the Join Properties dialog box (displayed via the View → Join Properties menu command).
Figure 1-46: Setting up a right join in a query
When a right join query is run, the number of returned records matches the number of records in the child table (assuming no criteria were used). For fields from the parent table, there will be blank data for the records in which there is no match between tables. shows the result of running the right join query.
Figure 1-47: The result of running a right join query
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Creating an Outer Join
I wish to combine the output of both a left join and a right join into a single query, but I can't get Access to do this type of "outer join."
A left join will return all the records from the table on the right side (the parent)and any related records from the table on the left side (the child). A right join will return all the records from the table on the left side and any related records from the table on the right side. An outer join combines these two outputs into one.
Access doesn't directly support outer joins, but because it is reasonable to create left and right joins, these two constructs can be brought together with a union query (see ).
shows two database tables. Not all the teams in the Teams table have matching records in the Games table, and the Games table contains some records that have no relation to the teams in the Teams table.
To create an outer query, you must be in the SQL pane of the query designer. The following is the SQL that would simulate an outer query for these tables by combining the output of a left join with a right join:
	SELECT Teams.Team, Games.Date, Games.Location
	FROM Games LEFT JOIN Teams ON
	Games.TeamID=Teams.TeamID
	UNION
	SELECT Teams.Team, Games.Date, Games.Location
	FROM Games RIGHT JOIN Teams ON
	Games.TeamID = Teams.TeamID
	ORDER BY Team, Date;
When the query is run (see ), there are, as expected, some blanks in the fields from the Teams table and the Games table (from ). The majority of records are matched. Running a standard inner join query on these tables returns 35 records—the count of records that match. This outer join result returns 49 records. There are 14 records that have blanks for the source from one table or the other.
Figure 1-49: The result of running an outer join
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Calculating with Queries
Queries can tell you a lot about your data. In addition to simply returning records, you can use aggregate functions that are part of the SQL language to return summaries of your data. There are aggregate functions that can sum, average, count, find the highest or lowest value, or return the standard deviation in the data, just to name a few. Not all are covered in the recipes in this chapter, but once you have a general idea of how aggregate functions work, you should be able to use any of them.
In this chapter, you'll also find recipes that showcase how to use your own custom, non-SQL functions from within a query. This is a very powerful technique because it enables you to design functions that deliver values that meet your exact requirements. And that's not all! There is also a recipe that describes how to use regular expressions, which provide powerful pattern-matching abilities, as well as recipes that demonstrate how to return all possible combinations of your data using a Cartesian product and how to construct crosstab queries.
Sometimes I need to summarize numerical data in a table. I know how to do this in a report—by using a calculation in a text box, in the report's footer. But how can I get a sum or average without having to create a report (or pull out a calculator)?
SQL provides aggregate functions that provide summaries of data. Two popular aggregate functions are Sum and Avg (for calculating the average). You can easily incorporate these into your query's design right within the query grid.
To use an aggregate function, select the View →Totals menu option while in the query grid (or, in Access 2007, click the Sigma (Σ) button in the Ribbon). This makes the Total row display in the grid. The Total row provides the assorted aggregate functions in a drop-down list.
Say you have a tblSales table containing a Customer_ID field, a PurchaseDate field, and an Amount field. There are records spanning dates from 2002 through 2005. To get a fast grand total of all the amounts in the table, just apply the Sum function to the Amount field. shows the design of the query, with the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Finding the Sum or Average in a Set of Data
Sometimes I need to summarize numerical data in a table. I know how to do this in a report—by using a calculation in a text box, in the report's footer. But how can I get a sum or average without having to create a report (or pull out a calculator)?
SQL provides aggregate functions that provide summaries of data. Two popular aggregate functions are Sum and Avg (for calculating the average). You can easily incorporate these into your query's design right within the query grid.
To use an aggregate function, select the View →Totals menu option while in the query grid (or, in Access 2007, click the Sigma (Σ) button in the Ribbon). This makes the Total row display in the grid. The Total row provides the assorted aggregate functions in a drop-down list.
Say you have a tblSales table containing a Customer_ID field, a PurchaseDate field, and an Amount field. There are records spanning dates from 2002 through 2005. To get a fast grand total of all the amounts in the table, just apply the Sum function to the Amount field. shows the design of the query, with the Sum function selected from the drop-down list in the Total row for the Amount field.
Figure 2-1: Selecting the Sum aggregate function
shows the result of running the query.
Figure 2-2: The result of running the Sum query
Note that a derived field name, SumOfAmount, has been used in the result. If you want, you can change this name in the SQL or by using an expression in the Field row of the query grid. shows how you can change the returned field name to Grand Total by using that name in the Field row, along with a colon (:) and the real field name.
Figure 2-3: Specifying a dynamic field name
To return an average of the values, use the same approach, but instead of Sum, select Avg from the drop-down list in the Total row.
Aggregate functions are flexible. You can apply criteria to limit the number of records to which the aggregation is applied—for example, you can calculate the average amount from purchases in just 2005. shows how to design a query to do so. Note that the criterion to limit the Amount field is placed in the second column, not in the same column that contains the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Finding the Number of Items per Group
I have a table of data containing customer address information. One of the fields is State. How do I create a query to return the number of customers in each state?
When a query uses aggregate functions, the Group By clause is a keystone. This SQL clause provides a grouping segregation, which then allows aggregation summaries to be applied per grouping. In this example, the Group By clause is applied to the State field. Then, the Count aggregate function is used to return the count of customers per group (i.e., per state).
shows the design of the query. The first column groups the State field. In the second column, the Count function is selected from the drop-down list in the Total row.
Figure 2-6: Using Count with Group By
shows the result of running the query. The number of customers in each state is returned.
Figure 2-7: Returned counts per state
The Group By clause can be applied to more than one field. In such a case, in sequence, each field that uses Group By further defines the narrowness of the count at the end.
Let's refine our count so we can see how many customers are in each city within each state. shows the breakdown of customers in Alaska by city. In the query result shown in , Alaska has a count of 20 customers. In the query result shown in , there are still 20 customers listed in Alaska, but now you can see the counts per city.
The SQL statement for the query in is this:
	SELECT tblCustomers.State, tblCustomers.City,
	Count(tblCustomers.CustomerID) AS Customers
	FROM tblCustomers
	GROUP BY tblCustomers.State, tblCustomers.City;
Figure 2-8: Count of customers grouped by state and city
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Using Expressions in Queries
I know how to create on-the-fly fields in a query (see ), and how to include some conditional intelligence. How, though, do I access a value that is not in an underlying table or query, but instead comes from another table altogether? How can I develop comprehensive expressions in a query?
Expressions are a powerful feature in Access. Expressions are used in a number of ways, such as for referencing forms and controls (whether from a macro, from a query, or from VBA). For this example, we'll use an expression within a query that accesses data from tables not included in the query's Select statement.
shows a query based on two tables: tblClients and tblPets. The key between these tables is ClientID. A field called PetID is the unique key for pets in the tblPets table. Two external tables, tblServiceDates and tblServiceDates_New, contain records of visits made by each pet. The first service date table, tblServiceDates, contains records for pets with PetIDs of up to 299. Higher PetIDs have service date records in the tblServiceDates_New table.
In the query in is a field based on a built-up expression, shown here:
	Last Service Date: IIf([PetID]<300,
	DLookUp("Max(DateOfService)",
	"tblServiceDates","[Pet_ID]=" & [PetID]),
	DLookUp("Max(DateOfService)",
	"tblServiceDates_New",
	"[Pet_ID]=" & [PetID]))
Figure 2-9: A query with an expression
This expression combines functions (IIf, DLookup, and Max) to find the last service date for each pet from the respective service date tables.
The full SQL statement that is generated from this design is:
	SELECT tblPets.PetID, tblClients.ClientLastName,
	tblPets.PetType, IIf([PetID]<300,
	DLookUp("Max(DateOfService)",
	"tblServiceDates","[Pet_ID]=" & [PetID]),
	DLookUp("Max(DateOfService)","tblServiceDates_New",
	"[Pet_ID]=" & [PetID])) AS [Last Service Date]
	FROM tblClients INNER JOIN tblPets ON
	tblClients.ClientID = tblPets.ClientID;
In summary, within a query, it is possible to build up a sophisticated expression that uses functions to address data and calculate results that stand outside of the standard SQL syntax.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Using Custom Functions in Queries
I often write long code routines to read through a table and process its data. It would be great if I could reduce the amount of code required by not creating and reading through a recordset. A select query addresses the table just as well. Is there a way to just apply the processing portion of the code direct from a query?
Calling a function from a query is relatively easy. Just use an extra column in the query grid to create a derived field. In that field, place an expression that calls the function; the returned value from the function is what will appear in the query result.
shows a table with records of activities performed for different clients. The records list the client name, the date when the work was done, the number of hours it took, and the type of work that was performed. The task is to calculate how much to charge for the work, per record.
First, let's develop a function that we can call from a query. It's important that this is a function and not just a sub. We need a returned value to appear in the query results, and while functions return values, subs do not. Here is the bill_amount function:
	Function bill_amount(the_date As Date, the_hours As Integer, _
	    the_client As String, WorkType As String) As Single
	bill_amount = 0 'in case of unexpected input
	Select Case WorkType
Figure 2-11: A table containing records of work performed for clients
	Case "Training"
	  bill_amount = the_hours * 80
	Case "Development"
	  bill_amount = the_hours * 120
	Case "Maintenance"
	  'Parker gets reduced rate regardless of day of week
	  'Other clients have separate weekday and weekend rates
	  If the_client <> "Parker" Then
	    If Weekday(the_date) = 1 Or Weekday(the_date = 7) Then
	      bill_amount = the_hours * 95
	    Else
	      bill_amount = the_hours * 75
	    End If
	  Else
	    bill_amount = the_hours * 60
	  End If
	End Select
	End Function
The function takes four arguments—one each for the four fields in the table—and calculates the amount to bill based on different facets of the data. The type of work performed, the client for whom it was performed, and whether it was done on the weekend (determined with the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Using Regular Expressions in Queries
Regular expressions provide the ability to set up sophisticated string patterns for matching records. Can a regular expression be used as the criterion in a query?
Regular expressions, popularized by Perl and other Unix-based languages, provide powerful string-matching capabilities. For Access and other Windows-based applications, adding a reference to the VBScript Regular Expressions library makes using regular expressions possible. You can set up the reference in the Visual Basic Editor (VBE). To display the VBE from Access, just press Alt-F11. While in the VBE, use the Tools →References menu option to display the References dialog. Then set a reference to Microsoft VBScript Regular Expressions 5.5 (your version number may be different), as shown in .
Figure 2-15: Setting a reference to the VBScript Regular Expressions library
shows a table with hypothetical transaction records.
The values in the TransactionRecord field are concatenations of separate codes and values that follow this pattern:
  • The first character is a letter that signifies a type of transaction. For example, an A could mean an adjustment, and a D could mean a deposit.
  • The next two characters are numbers that represent a transaction type.
  • The next two characters are a department code. For example, LE is Legal, SA is Sales, HR is Human Resources, WA is Warehouse, and so on.
  • The last three numbers are an amount.
Figure 2-16: A table with transaction records
Given this pattern, let's construct a query that will identify transactions that are either adjustments or deposits, are for the Sales department, and are for amounts of 500 or greater. Put another way, our task will be to identify transaction records consisting of an A or a D, followed by any two numbers, followed by SA, followed by a number that is 5 or greater, and any two other numbers.
To make use of pattern matching, we'll use a custom function to create a regular expression (regexp) object. In a separate code module, enter this function:
	Function validate_transaction(transaction_record As String, _
	    match_string As String) As String
	  validate_transaction = "Invalid Record"
	  Dim regexp As regexp
	  Set regexp = New regexp
	  With regexp
	    .Global = True
	    .IgnoreCase = True
	    .Pattern = match_string
	    If .Test(transaction_record) = True Then
	      validate_transaction = "Valid Record"
	    End If
	  End With
	  Set regexp = Nothing
	End Function
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Using a Cartesian Product to Return All Combinations of Data
I have a list of teams and a list of locations with ballparks. I wish to create a master list of all the combinations possible between these two lists.
We typically use queries to limit the amount of returned records, and at the very least, we expect the number of returned records to be no greater than the number of records in the largest table or query being addressed.
However, there is a special type of join, called a Cartesian join, that returns the multiplicative result of the fields in the query (otherwise known as the Cartesian product). A Cartesian join is the antithesis of standard joins—it works as if there is no join. Whereas the other join types link tables together on common fields, no field linking is required to return a Cartesian product.
shows a table of teams and a table of locations. Simply put, we are looking for all the combinations that can exist between these two tables.
Figure 2-19: A table of teams and a table of locations
shows the design of a query that essentially has no join. There is no line connecting the tables. In fact, the single fields in each table really don't relate to each other.
The SQL generated by the design in looks like this:
	SELECT Teams1.Team, Locations.Location
	FROM Teams1, Locations;