Chapter 11. Further Considerations
Hopefully the previous chapters have given you a solid practical understanding of how to resolve entities within your datasets and have equipped you to overcome some of the challenges you are likely to meet along the way.
Real-world data is messy and full of surprises, so joining it up is rarely straightforward. But it’s well worth spending the time to make the connections because the story becomes so much richer when we can bring together all the pieces of the jigsaw.
In this short closing chapter, I’ll talk about a few aspects of entity resolution that are worth considering when building a resilient production solution. I’ll also share some closing thoughts on the future of the art and science of entity resolution.
Data Considerations
As with any analytic process, the importance of understanding the context and quality of your input data cannot be overstated. Quirks or misunderstandings in data that a traditional application could tolerate may fundamentally derail a matching process. Poor data can result in over- and underlinking, sometimes matching entities that do not represent the same person, with potentially serious consequences.
In this section, I’ll discuss the most important data-related issues to consider when performing entity resolution.
Unstructured Data
Throughout this book we have primarily used structured data to perform the matching process. When we encountered semi-unstructured data we used very simple rules of thumb to extract ...
Get Hands-On Entity Resolution now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.