Chapter 5. Managing Indexes

As we briefly mentioned in the preceding chapter, Sphinx lets you search through multiple indexes at the same time. There are usually two reasons for devoting multiple indexes to the same application area: the main+delta strategy that greatly reduces the delay in keeping an index up-to-date, and parallelizing queries across indexes to reduce the delay in responding to queries. All serious production sites use multiple indexes, so you’ll find this chapter to be a natural sequel to the preceding one. The strategy leads to complexities that I’ll cover in this chapter. But I’ll occasionally diverge from the “general overview” approach of previous chapters and focus more on specific features, all of the nitty and even some of the gritty details of engine internals, and concrete use cases and dos and don’ts.

The “Divide and Conquer” Concept

Plain disk indexes need to be fully rebuilt from scratch every time you need to update the text data they contain. This can lead to delays of minutes or even hours before new and updated rows appear in response to queries—and that’s not even considering the waste of CPU cycles and networking.

Many people, including myself, lack the patience for this. Should you stand for this in your very own applications? It depends on the numbers, and concrete figures are easy to approximate with a bit of simple back-of-the-envelope math.

On modern commodity gear (which, at the time of this writing, means multicore CPUs ...

Get Introduction to Search with Sphinx now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.