Performing context Ngram in Hive
Ngrams are sequences that are collected from specific sets of words and are based on their occurrence in a given text. N-grams are generally used to find the occurrence of certain words in a sequence, which helps in the calculation of sentiment analysis. Hive provides built-in support for Ngram calculations by providing a function. In this recipe, we will take a look at how to use this function in order to analyze text data.
Getting ready
To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Hive installed on it. Here, I am using Hive 1.2.1.
How to do it...
N-gram can be used to find the most frequently used word after a sequence of words in a give text dataset. To do this, ...
Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.