Chapter 9

Blocking

Abstract

The chapter discusses blocking as a technique for reducing the total number of pair-wise comparisons necessary for an ER algorithm to arrive at an acceptable clustering result. Blocking or some other type of comparison reduction must be used in order to implement a practical ER system. This chapter focuses on a particular type of blocking called match key blocking. It also discusses the importance of match-key-to-rule alignment, match key precision, match key recall, and strategies for creating and optimizing match key generators.

Keywords

Blocking; Match Key; Inverted Indexing

Blocking

As necessary as the considerations discussed in Chapter 8 are to the design of a logically sound ER system, they are not sufficient to implement ...

Get Entity Information Life Cycle for Big Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.