You want to retrieve a result set containing a random sample of records from a result set using SQL Server 2005 or later.
Use either the TABLESAMPLE
T-SQL clause or a custom method. The solution shows how to use both techniques to return a random sample of 10 percent of records from the Person.Contact
table in AdventureWorks
. In either case, the top 10 sampled rows are returned.
The C# code in Program.cs in the project RetrieveRandomSampleRecords
is shown in Example 4-12.
Example 4-12. File: Program.cs for RetrieveRandomSampleRecords solution
using System; using System.Data; using System.Data.SqlClient; namespace RetrieveRandomSampleRecords { class Program { static void Main(string[] args) { string sqlConnectString = "Data Source=(local);" + "Integrated security=SSPI;Initial Catalog=AdventureWorks;"; string sqlSelect; DataTable dt; SqlDataAdapter da; // Get the count of records in Person.Contact. sqlSelect = "SELECT COUNT(*) FROM Person.Contact"; int recCount; using (SqlConnection connection = new SqlConnection(sqlConnectString)) { SqlCommand command = new SqlCommand(sqlSelect, connection); connection.Open(); recCount = (int)command.ExecuteScalar(); } Console.WriteLine("Person.Contact record count = {0}\n", recCount); // Fill the DataTable with Person.Contact data sqlSelect = "SELECT * FROM Person.Contact TABLESAMPLE (10 PERCENT)"; dt = new DataTable(); da = new SqlDataAdapter(sqlSelect, sqlConnectString); da.Fill(dt); // Output the top 10 rows using TABLESAMPLE Console.WriteLine("---{0} rows retrieved using TABLESAMPLE. " + "Displaying top 10 rows.---", dt.Rows.Count); for (int i = 0; i < 10; i++) Console.WriteLine("ID = {0}\t{1}, {2}", dt.Rows[i]["ContactID"], dt.Rows[i]["LastName"], dt.Rows[i]["FirstName"]); // Fill the DataTable with Person.Contact data sqlSelect = "SELECT * FROM Person.Contact WHERE " + "0.1 >= CAST(CHECKSUM(NEWID(), ContactID) & 0x7fffffff AS float) " + "/ CAST(0x7fffffff AS int)"; dt = new DataTable(); da = new SqlDataAdapter(sqlSelect, sqlConnectString); da.Fill(dt); // Output top 10 rows using custom method Console.WriteLine("\n---{0} rows retrieved using custom method. " + "Displaying top 10 rows.---", dt.Rows.Count); for (int i = 0; i < 10; i++) Console.WriteLine("ID = {0}\t{1}, {2}", dt.Rows[i]["ContactID"], dt.Rows[i]["LastName"], dt.Rows[i]["FirstName"]); Console.WriteLine("\nPress any key to continue."); Console.ReadKey(); } } }
The output is shown in Figure 4-12.
The TABLESAMPLE
clause, introduced in SQL Server 2005, quickly returns a random, representative sample of the table expressed as either an approximate number of rows or a percentage of the total rows. Unlike the TOP
clause, TABLESAMPLE
returns a result set containing a sampling of rows from all rows processed by the query. As you can see from the results in Figure 4-12, the TABLESAMPLE
clause does not return a truly random sample at the individual row level.
The TABLESAMPLE
clause syntax is:
TABLESAMPLE [SYSTEM] (sample_number [PERCENT | ROWS]) [REPEATABLE (repeat_seed)]
Where:
SYSTEM
An ANSI SQL keyword that specifies a database server-dependent sampling method. Although other databases support additional sampling methods that are database server-independent (e.g., DB2 supports
BERNOULLI
),SYSTEM
is the only method supported by SQL Server and the default value if not specified.sample_number [PERCENT | ROWS])
A numeric expression that specifies the number of rows to return or the percentage of rows in the result set to return.
REPEATABLE (repeat_seed)
The seed used to select rows to be returned in the sample.
REPEATABLE
indicates that the selected sample can be returned more than once. If the same seed is used, the same rows will be returned each time the query is run as long as no changes have been made to the data in the table.
The TABLESAMPLE
clause cannot be used with views or in an inline table-valued function.
The custom technique calculates a CHECKSUM
based on a NEWID()
and the ContactID
value for each row and scales it to a value between 0 and 1. This value is compared to 0.1 to select the top 10 percent of rows.
Get ADO.NET 3.5 Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.