Chapter 8 Data Profiling
Finally we are going to talk about data. Most of the book so far has basically centered on planning and infrastructure—what goes into the project before you actually start. At this point, it is time to get our hands dirty—and I really mean it! No matter what any external consultant tells you about the status of what is in your data, there is no excuse not to settle down for a good hard look at your data sets just to see whether they really display the characteristics you think they do. This process, a large part of which can be automated, is referred to as data profiling.
The goal of profiling data is to discover metadata when it is not available and to validate metadata when it is available. Data profiling is a process ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access