Chapter 4. Taming tokens

This chapter covers

  • Tokenization to extract ideas rather than words
  • The concepts of precision and recall in search
  • Making trade-offs between precision and recall
  • Controlling the specificity of matches
  • Encoding non-textual data into the search engine

At this point, you have a good understanding of why relevance is critical for the success of a search application (chapter 1). You also have a working knowledge of search engine internals (chapter 2) and can debug relevance to pin down why documents match and why they’re given a particular score (chapter 3).

Now, armed with motivation, knowledge, and tools, it’s time to dive into the art of relevance engineering. In this chapter, we focus on text analysis. Proper ...

Get Relevant Search now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.