Summary
In this chapter, we learned about the various underlying concepts in natural language processing. We discussed tokenization and how to separate input text into multiple tokens. We learned how to reduce words to their base forms using stemming and lemmatization. We implemented a text chunker to divide input text into chunks based on predefined conditions.
We discussed the Bag of Words model and built a document term matrix for input text. We then learnt how to categorize text using machine learning. We constructed a gender identifier using a heuristic. We used machine learning to analyze the sentiments of movie reviews. We discussed topic modeling and implemented a system to identify topics in a given document.
In the next chapter, we will ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access