Chapter 9: Profiling data in Azure

Data profiling is an important part of every data project. It helps the data modeler create an accurate data model and tells ETL developers what type of data we have and how clean the data is. It will also dictate the various transformations we should apply to it.

Data profiling can help us find what metrics we can derive from the source dataset and to what extent we need to change (transform) the data to meet business rules. It can also help us find data inconsistencies before starting the ETL phase and derive a valid data model based on the source dataset.

The process flow from data ingestion to reporting can be described with the following diagram:

Figure 9.1 – An overview of the data profiling process ...

Get ETL with Azure Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.