Chapter 11. Simple Entity Resolution in Graphs

Thinking back to our first example in this book, how do you know who your customer is in your C360 model?

Do you have a strong identifier in your dataset, like a social security number or member ID? How much do you trust those identifiers, and their source, to represent unique people with 100% accuracy?

Different industries have different tolerance levels for inaccuracy.

In healthcare, false positives can lead to misdiagnoses and potentially deadly distributions of medicine. On the other hand, if you are working with data about movies, incorrect movie resolution will lead to a less-than-seamless user experience for your application, but at least we are not talking about someone’s life being on the line.

The problem of inferring who is whom and what is what from keys and values in your data source has been a challenge since we began writing down information about people. This problem is called entity resolution and has a long, storied history of technical solutions.

For any team working on entity resolution, it is important to get things right within whatever margin of error is acceptable in your business domain.

Chapter Preview: Merging Multiple Datasets into One Graph

In this chapter, we will unveil how we merged two movie datasets, the challenges we faced along the way, and the decisions we made.

First, we will define entity resolution and how it relates to two problems we have been teaching in this book: C360 and movie recommendations. ...

Get The Practitioner's Guide to Graph Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.