Data Flow best practices in Extract and Load
When we extract data from sources, the main focus is on getting just the changed data (inserts, updates, and deletes) since our last extraction. Reading all data (that could be millions of rows) from sources and then making a comparison row-by-row could cause a bottleneck in the system.
To get just the required data, several techniques can be applied:
- Audit columns: Having a date and time column for changes or even having two columns, one for insert and another for the update of a specific record.
- Checksums: Using an algorithm to create a unique identifier for a column set. Microsoft included in SQL 2008 a new SQL function that enables this technique to be applied faster and more easily.
- Change Data Capture ...