Identifying parts of speech, handling n-grams, and recognizing named entities
One of the first things that you might want to look at is recognizing parts of speech for a word; it is really fundamental to understand in a sentence that the word checks is a verb or noun.
This, as useful as it is, will not help you handle bigrams (or, more generally, n-grams): clusters of words that, if analyzed separately (in a certain context), would lead to improper understanding of the text. For example, consider a phrase neural networks in an article on machine learning and, more specifically, an application of neural networks to control packet scheduling and routing in a local network. In the same article, these two words (neural
and networks
) can occur on their ...
Get Practical Data Analysis Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.