The natural language processing pattern
This design pattern explores the implementation of natural language processing on unstructured text data using Pig.
Information retrieval from unstructured data, such as blogs and articles, revolves around extracting meaningful information from huge chunks of un-annotated text. The core goal of information retrieval is to extract structured information from unstructured text. This structured information is indexed to optimize the search. For example, consider the following sentence:
"Graham Bell invented the telephone in 1876"
The preceding sentence is used to extract the following structured information:
Inventorof (Telephone, Graham Bell) InventedIn(Telephone, 1876)
There are a number of ways in which ...