The natural language processing pattern

This design pattern explores the implementation of natural language processing on unstructured text data using Pig.

Background

Information retrieval from unstructured data, such as blogs and articles, revolves around extracting meaningful information from huge chunks of un-annotated text. The core goal of information retrieval is to extract structured information from unstructured text. This structured information is indexed to optimize the search. For example, consider the following sentence:

"Graham Bell invented the telephone in 1876"

The preceding sentence is used to extract the following structured information:

Inventorof (Telephone, Graham Bell)
InventedIn(Telephone, 1876)

There are a number of ways in which ...

Get Pig Design Patterns now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.