4.1. Determining the Differences in Data Between Two DataSet Objects

Problem

You have two DataSet objects with the same schema but containing different data and need to determine the difference between the data in the two.

Solution

Compare the two DataSet objects with the GetDataSetDifference() method in this solution and return the differences between the data as a DiffGram.

The solution creates two DataSet objects each containing a different subset of records from the HumanResources.Department table in AdventureWorks. The GetDataSetDifference() method takes two DataSet objects with identical schemas as arguments and returns a DiffGram of the differences between the data in the two.

The C# code in Program.cs in the project DetermineDataDifferenceDataSets is shown in Example 4-1.

Example 4-1. File: Program.cs for DetermineDataDifferenceDataSets solution

using System;
using System.Data;
using System.Data.SqlClient;
using System.IO;

namespace DetermineDataDifferenceDataSets
{
    class Program
    {
        static void Main(string[] args)
        {
            string sqlConnectString = "Data Source=(local);" +
                "Integrated security=SSPI;Initial Catalog=AdventureWorks;";

            // Fill DataSet A
            string sqlSelectA = "SELECT * FROM HumanResources.Department " +
                "WHERE DepartmentID BETWEEN 1 AND 5";
            DataSet dsA = new DataSet();
            SqlDataAdapter da = new SqlDataAdapter(sqlSelectA, sqlConnectString);
            da.TableMappings.Add("Table", "TableA");
            da.FillSchema(dsA, SchemaType.Source);
            da.Fill(dsA);
            // Set the primary key
            dsA.Tables["TableA"].PrimaryKey =
                new DataColumn[] { dsA.Tables["TableA"].Columns["DepartmentID"] };

            // Fill DataSet B
            string sqlSelectB = "SELECT * FROM HumanResources.Department " +
                "WHERE DepartmentID BETWEEN 4 AND 8";
            DataSet dsB = new DataSet();
            da = new SqlDataAdapter(sqlSelectB, sqlConnectString);
            da.TableMappings.Add("Table", "TableB");
            da.FillSchema(dsB, SchemaType.Source);
            da.Fill(dsB);
            // Set the primary key
            dsB.Tables["TableB"].PrimaryKey =
                new DataColumn[] { dsB.Tables["TableB"].Columns["DepartmentID"] };

            Console.WriteLine(GetDataSet Difference(dsA, dsB));

            Console.WriteLine("\nPress any key to continue.");
            Console.ReadKey();
        }

        private static string GetDataSetDifference(DataSet ds1, DataSet ds2)
        {
            // Accept any edits within the DataSet objects.
            ds1.AcceptChanges();
            ds2.AcceptChanges();

            // Create a DataSet to store the differences.
            DataSet ds = new DataSet();

            DataTable dt1Copy = null;
            // Iterate over the collection of tables in the first DataSet.
            for (int i = 0; i <  ds1.Tables.Count; i++)
            {
                DataTable dt1 = ds1.Tables[i];
                DataTable dt2 = ds2.Tables[i];

                // Create a copy of the table in the first DataSet.
                dt1Copy = dt1.Copy();

                // Iterate over the collection of rows in the
                // copy of the table from the first DataSet.
                foreach (DataRow row1 in dt1Copy.Rows)
                {
                    DataRow row2 = dt2.Rows.Find(row1["DepartmentID"]);
                    if (row2 == null)
                    {
                        // Delete rows not in table 2 from table 1.
                        row1.Delete();
                    }
                    else
                    {
                        // Modify table 1 rows that are different from
                        // table 2 rows.
                        for (int j = 0; j <  dt1Copy.Columns.Count; j++)
                        {
                            if (row2[j] == DBNull.Value)
                            {
                                // Column in table 2 is null,
                                // but not null in table 1
                                if (row1[j] != DBNull.Value)
                                    row1[j] = DBNull.Value;
                            }
                            else if (row1[j] == DBNull.Value)
                            {
                                // Column in table 1 is null,
                                // but not null in table 2
                                row1[j] = row2[j];
                            }
                            else if (row1[j].ToString() !=
                                row2[j].ToString())
                            {
                                // Neither column in table 1 nor
                                // table 2 is null, and the
                                // values in the columns are
                                // different.
                                row1[j] = row2[j];
                            }
                        }
                    }
                }

                foreach (DataRow row2 in dt2.Rows)
                {
                    DataRow row1 =
                        dt1Copy.Rows.Find(row2["DepartmentID"]);
                    if (row1 == null)
                    {
                        // Insert rows into table 1 that are in table 2
                        // but not in table 1.
                        dt1Copy.LoadDataRow(row2.ItemArray, false);
                    }
                }

                // Add the table to the difference DataSet.
                ds.Tables.Add(dt1Copy);
            }

            // Write a XML DiffGram with containing the differences between tables.
            StringWriter sw = new StringWriter();
            ds.WriteXml(sw, XmlWriteMode.DiffGram);

            return sw.ToString();
        }
    }
}

The output is shown in Figure 4-1.

Discussion

A DiffGram is an XML format used to specify original and current values for the data elements in a DataSet. It does not include any schema information. The DiffGram is used by .NET Framework applications as the serialization format for the contents of a DataSet including changes made to the Dataset.

A DiffGram is XML-based, which makes it platform-and application-independent. It is not, however, widely used or understood outside of Microsoft .NET applications.

The DiffGram format is divided into three sections: current, original, and errors. The original and current data in the DiffGram can also be used to report the differences between data in two DataSet objects. For more information about the DiffGram XML format, see Recipe 9.4.

The solution contains a method GetDataSetDifference() that takes two DataSet objects with the same schema as arguments and returns a DiffGram containing the differences in data when the second DataSet is compared to the first. Table 4-1 describes how the differences between the DataSet objects appear in the DiffGram.

Output for DetermineDataDifferenceDataSets solution

Figure 4-1. Output for DetermineDataDifferenceDataSets solution

Table 4-1. DiffGram representation of DataSet differences

Condition

DiffGram representation

Row is the same in both DataSet 1 and DataSet 2

Row data appears only in the current data section of the DiffGram without a diffgr:haschanges attribute. (In the example, rows with DepartmentID of 4 and 5 appear.)

Row is in both DataSet 1 and DataSet 2 but the rows do not contain the same data

Row data appears in the current data section of the DiffGram. The row element contains the attribute diffgr: hasChanges with a value of "modified". The data in the current section is the updated data. The original data appears in the original <diffgr:before> block of the DiffGram.

Row is in DataSet 2 but not in DataSet 1

Row data appears in the current data section of the DiffGram. The row element contains the attribute diffgr:hasChanges with a value of "inserted". (In the example, rows with Department IDbetween 6 and 8 appear.)

Row is DataSet 1 but not in DataSet 2

Row data appears only in the original <diffgr:before> block of the DiffGram. (In the example, rows with DepartmentID between 1 and 3 appear.)

The sample begins by loading two different subsets of data from the HumanResources. Department table into separate DataSet objects. In this example, the DataSet objects both contain just a single table. To determine the difference between the DataSet objects, the tables within the DataSet objects are compared as described next and changes are applied to the data in a copy of the first DataSet until it matches the second DataSet. Once all differences in all tables are processed, the DiffGram of the copy of the first DataSet contains the difference in the second DataSet when compared to the first DataSet.

More specifically, a copy of each table is made as it is processed. The data in the copy of the first table is modified to make it consistent with the data in the second table. The modified copy of the first table is then added to the DataSet containing the differences between the two DataSet objects.

The process of modifying the data in the copy of the first table to match the data in second table involves several steps:

  • Rows that are in the copy of the first table but not in the second table (based on the primary key value) are deleted from the copy of the first table.

  • If the row is found in the second table, the columns are compared and any differences in the columns in the second table are changed in the column in the first table.

  • Rows that are in the second table but not in the copy of the first table are inserted into the copy of the first table without accepting changes.

Get ADO.NET 3.5 Cookbook, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.