Chapter 6. Detecting Liars and the Confused in Contradictory Online Reviews

Jacob Perkins

Did you know that people lie for their own selfish reasons? Even if this is totally obvious to you, you may be surprised at how blatant this practice has become online, to the point where some people will explain their reasons for lying immediately after doing so.

I knew unethical people would lie in online reviews in order to inflate ratings or attack competitors, but what I didn’t know, and only learned by accident, is that individuals will sometimes write reviews that completely contradict their associated rating, without any regard to how it affects a business’s online reputation. And often this is for businesses that an individual likes.

How did I learn this? By using ratings and reviews to create a sentiment corpus, I trained a sentiment analysis classifier that could reliably determine the sentiment of a review. While evaluating this classifier, I discovered that it could also detect discrepancies between the review sentiment and the corresponding rating, thereby finding liars and confused reviewers. Here’s the whole story of how I used text classification to identify an unexpected source of bad data...


At my company, Weotta,[8] we produce applications and APIs for navigating local data in ways that people actually care about, so we can answer questions like: Is there a kid-friendly restaurant nearby? What’s the nearest hip yoga studio? What concerts are happening this weekend? ...

Get Bad Data Handbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.