In the preceding chapters, we have solely relied on the structure of the HTML documents themselves to scrape information from them, and that is a powerful method to extract information.
However, for many use cases, that still doesn’t get us specific enough information, and we have to use algorithms and techniques which work directly on raw text itself.
We will survey natural language processing (NLP) techniques and their common use cases in this chapter. The goal here is to present NLP methods and case studies illustrating their ...