Chapter 21. Using AI to Build a Comprehensive Database of Knowledge

Extracting structured information from semi-structured or unstructured data sources (“dark data”) is an important problem. One can take it a step further by attempting to automatically build a knowledge graph from the same data sources. Knowledge databases and graphs are built using (semi-supervised) machine learning, and then subsequently used to power intelligent systems that form the basis of AI applications. The more advanced messaging and chat bots you’ve encountered rely on these knowledge stores to interact with users.

In the June 2, 2016 episode of the Data Show, I spoke with Mike Tung, founder and CEO of Diffbot, a company dedicated to building large-scale knowledge databases. Diffbot is at the heart of many web applications, and it’s starting to power a wide array of intelligent applications. We talked about the challenges of building a web-scale platform for doing highly accurate, semi-supervised, structured data extraction. We also took a tour through the AI landscape and the early days of self-driving cars.

Here are some highlights from our conversation.

Building the Largest Structured Database of Knowledge

If you think about the web as a virtual world, there are more pixels on the surface area of the web than there are square millimeters on the surface of the earth. As a surface for computer vision and parsing, it’s amazing, and you don’t have to actually build a physical robot in order ...

Get Artificial Intelligence Now now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.