Chapter 11. Basic Indexing

This chapter will take you into the mysterious and sometimes puzzling world of database indexes. As soon as your dataset starts growing and performance starts degrading as a result, indexes become a necessity.

Just for a moment let’s imagine that there are no indexes. In such an environment, this would mean that every XPath request must be resolved by brute force. So for a query like //line[@author eq "erik"], the full document(s) node tree(s) must be traversed to try to find line elements with an author attribute that matches the value erik. You can probably see that on a large dataset this could be an intensive, and ultimately a slow, operation. If you further imagine running many of these queries on demand by your users in parallel, things can only get worse!

Of course, indexes come with a cost of their own: when XML documents are created or updated, the corresponding indexes must be updated too. However, this is generally not a problem. For most (but not all) applications, updating is a much rarer event than querying, and the short time lags used for updating the indexes go unnoticed.

Large databases, XML or otherwise, rarely scale well without indexes. Performance degradation as the dataset grows could be linear or often worse. Therefore, defining and tuning indexes is well worth the effort and often a necessity.

Note

Besides the indexes mentioned here and Chapter 12, there is also an index that supports explicit ordering, known as the sort index. Since ...

Get eXist now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.