Use a Query to Retrieve a Random Set of Rows

Problem

You need to be able to retrieve a random set of rows from a table or a query so you can identify a random sample for a research study. You can’t find a way to make this happen in the normal query design grid. What’s the trick to getting a random sample of a certain number of rows?

Solution

The solution to this problem is not quite as simple as it might first appear, because of the way Access attempts to optimize the use of function calls in queries. You can call a VBA function to generate a random value for each row, but to ensure that your function runs for each row, and not just once, you need to feed it a value from the row. Once you’ve generated the random numbers, you can sort by that random column and use a Top Values query to select a random group.

In 01-07.MDB, open tblRandom. This table includes 50 rows of data. Your goal is to pull five randomly selected rows for this set of data. To do this, follow these steps:

  1. Import the module basRandom from 01-07.MDB or create your own, including this single function:

    Public Function acbGetRandom(varFld As Variant)
       
       ' Though varFld isn't used, it's the only way to force the query
       ' to call this function for each and every row.
    
       Randomize
       acbGetRandom = Rnd
    End Function
  2. Create a new select query or use an existing one. Add any fields you’re interested in.

  3. Add an extra column, with the following expression replacing the reference to the State field with a single field in your query’s underlying table or query (this query won’t run correctly unless you pass one of your field names to the function):

    acbGetRandom([State])

    You can clear this field’s Show checkbox, because there’s not much point in viewing a continually changing random number as part of your query output. Set the Sort value for the newly calculated field to Ascending (see Figure 1-21).

    The sample query, qryRandom, set up to retrieve five random rows

    Figure 1-21. The sample query, qryRandom, set up to retrieve five random rows

  4. Open the query’s properties sheet (make sure the View Properties menu item is checked, and click on the upper area of the query design surface so the properties sheet’s titlebar says Query Properties). Fill in the number of rows you’d like to return in the TopValues property. Figure 1-21 shows the sample query, qryRandom, in design view with the property filled in.

  5. Run the query. Your query grid should show you as many rows as you specified in the properties sheet. If you press Shift-F9, asking Access to requery the data, you will see a different set of rows. Repeating the process will return a different set of rows each time.

Discussion

The general concept behind this solution is simple: you add a new column to your query, fill it with a list of random numbers, sort on those random numbers, and retrieve the top n rows, where n is a number between 1 and the number of rows in your underlying data. There’s only one complicating factor: to create the random number, you need to call a function for each row. Access tries to optimize such a function call and will call it only once for the entire set of data, unless the function call involves a field in the data. That is, if you replace the call to acbGetRandom (in Step 3) with a simpler call directly to Access’s random number function (Rnd), you’ll find that every value in every row will be exactly the same. Access’s query engine thinks that the function has nothing to do with data in the query, so it calls the function only once. This makes the random number meaningless, as the whole point of using a random number is to generate a different one for each row.

The workaround, though, is simple: pass a field, any field, as a parameter to the function you call. That way, Access believes that the return value from the function is dependent on the data in each row and so calls the function once per row, passing to it the field you specify in the expression. The acbGetRandom function doesn’t really care about the value you pass it, because its only goal is to get a random number and return that back to the query. Once you successfully place a random number in each row Access will sort the data based on that number, because you specified Ascending for the column’s sorting.

Finally, by specifying the TopValues property for the query, you’re asking Access to return only that many rows as the result set of the query. If you want a certain percentage of the total rows, change it by adding the % sign after the Top value.

The acbGetRandom function includes a call to the VBA Randomize subroutine. By calling Randomize, you’re asking Access to give you a truly random result every time you call the function. If you omit this call, Access gives you the same series of random numbers each time you start it up and run this query. If you want a repeatable series of random rows, remove the call to Randomize. If you want a different set of rows each time you run the query, leave the Randomize statement where it is.

Because Access will pass a field value to the acbGetRandom function for each and every row of data in your data source, you’ll want to optimize this function call as much as you can. If possible, use either a very short text field (zip code, for example) or, even better, an integer. You must pass some value, but you want it to be as small as possible to minimize the amount of information that must be moved around for each row of the data.

Get Access Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.