Blocking
Abstract
The chapter discusses blocking as a technique for reducing the total number of pair-wise comparisons necessary for an ER algorithm to arrive at an acceptable clustering result. Blocking or some other type of comparison reduction must be used in order to implement a practical ER system. This chapter focuses on a particular type of blocking called match key blocking. It also discusses the importance of match-key-to-rule alignment, match key precision, match key recall, and strategies for creating and optimizing match key generators.
Keywords
Blocking; Match Key; Inverted IndexingBlocking
Get Entity Information Life Cycle for Big Data now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.