A Deep Dive into the Capture Phase
Abstract
This chapter describes the start of the CSRUD Life Cycle with initial capture and storage of entity identity information. It also discusses the importance of understanding the characteristics of the data, properly preparing the data, selecting identity attributes, and coming up with matching strategies. Perhaps most importantly, it discusses the methods and techniques for evaluating ER outcomes.
Keywords
Data profiling; data matching; benchmarking; truth sets; review indicatorsAn Overview of the Capture Phase
Get Entity Information Life Cycle for Big Data now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.