O'Reilly logo

Mining the Social Web by Matthew A. Russell

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Closing Remarks

This chapter introduced some of the fundamentals of IR theory: TF-IDF, cosine similarity, and collocations. Given the immense power of search providers like Google, it’s easy to forget that these foundational search techniques even exist. However, understanding them yields insight into the assumptions and limitations of the commonly accepted status quo for search, while also clearly differentiating the state-of-the-art entity-centric techniques that are emerging. (Chapter 8 introduces a fundamental paradigm shift away from the tools in this chapter and should make the differences more pronounced than they may seem if you haven’t read that material yet.) If you’d like to try applying the techniques from this chapter to the Web (in general), you might want to check out Scrapy, an easy-to-use and mature web scraping and crawling framework.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required