Chapter 6. Relevance and Ranking

You’re now armed with a good chunk of knowledge about getting up and running with Sphinx, creating and managing indexes, and writing proper queries. However, there’s one more skill that’s of use with nearly every site: improving search quality. So, let’s spend some time discussing quality in general and what Sphinx can offer, shall we?

Relevance Assessment: A Black Art

We can’t really chase down “search quality” until we formally define it and decide how we measure it. An empirical approach, as in “Here, I just made up another custom ranking rule out of thin air and I think it will generally improve our results any time of day,” wears out very soon. After about the third such rule, you can no longer manage such an approach, because the total number of rule combinations explodes combinatorially, and arguing about (not to mention proving) the value of every single combination quickly becomes impossible. A scientific approach, as in “Let us introduce some comprehensible numerical metrics that can be computed programmatically and then grasped intuitively,” yields to automation and scales somewhat better.

So, what is search quality? Chapter 1 mentioned that documents in the result set are, by default, ordered using a relevance ranking function that assigns a different weight to every document, based on the current query, document contents, other document attributes, and other factors. But it’s very important to realize that the relevance value that is computed ...

Get Introduction to Search with Sphinx now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.