Entity-Centric Analysis: A Deeper Understanding of the Data

Throughout this chapter, it’s been implied that analytic approaches that exhibit a deeper understanding of the data can be dramatically more powerful than approaches that simply treat each token as an opaque symbol. But what does “a deeper understanding” of the data really mean? One interpretation is being able to detect the entities in documents and using those entities as the basis of analysis, as opposed to document-centric analysis involving keyword searches or interpreting a search input as a particular type of entity and customizing results accordingly. Although you may not have thought about it in those terms, this is precisely what emerging technologies such as WolframAlpha do at the presentation layer. For example, a search for “tim o’reilly” in WolframAlpha returns results that imply an understanding that the entity being searched for is a person; you don’t just get back a list of documents containing the keywords (see Figure 8-3). Regardless of the internal technique that’s used to accomplish this end, the resulting user experience is dramatically more powerful because the results conform to a format that more closely satisfies the user’s expectations.

Sample results for a “tim o’reilly” query with WolframAlpha

Figure 8-3. Sample results for a “tim o’reilly” query with WolframAlpha

Although it’s beyond the scope of this chapter to ponder the various possibilities of entity-centric ...

Get Mining the Social Web now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.