Chapter 8. Blogs et al.: Natural Language Processing (and Beyond)
This chapter is a modest attempt to introduce Natural Language Processing (NLP) and apply it to the unstructured data in blogs. In the spirit of the prior chapters, it attempts to present the minimal level of detail required to empower you with a solid general understanding of an inherently complex topic, while also providing enough of a technical drill-down that youâll be able to immediately get to work mining some data. Although weâve been regularly cutting corners and taking a Pareto-like approachâgiving you the crucial 20% of the skills that you can use to do 80% of the workâthe corners weâll cut in this chapter are especially pronounced because NLP is just that complex. No chapter out of any bookâor any small multivolume set of books, for that matter, could possibly do it justice. This chapter is a pragmatic introduction thatâll give you enough information to do some pretty amazing things, like automatically generating abstracts from documents and extracting lists of important entities, but we will not journey very far into topics that would require multiple dissertations to sort out.
Although itâs not absolutely necessary that you have read Chapter 7 before you dive into this chapter, itâs highly recommended that you do so. A good understanding of Natural Language Processing presupposes an appreciation and working knowledge of some of the fundamental strengths and weaknesses of TF-IDF, vector ...
Get Mining the Social Web now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.