Chapter 11. Basic Indexing

This chapter will take you into the mysterious and sometimes puzzling world of database indexes. As soon as your dataset starts growing and performance starts degrading as a result, indexes become a necessity.

Just for a moment let’s imagine that there are no indexes. This would mean that every XPath request must be resolved by brute force. So, for a query like //line[@author eq "erik"], the full document(s) node tree(s) must be traversed to try to find line elements with an author attribute that matches the value erik. You can probably see that on a large dataset this could be an intensive, and ultimately a slow, operation. If you further imagine running many of these queries on demand by your users in parallel, things can only get worse!

Of course, indexes come with a cost of their own: when XML documents are created or updated, the corresponding indexes must be updated too. However, this is generally not a problem. For most (but not all) applications, updating is a much rarer event than querying, and the short time lags created by updating the indexes go unnoticed.

Large databases, XML or otherwise, rarely scale well without indexes. Performance degradation as the dataset grows could be linear, or often worse. Therefore, defining and tuning indexes is well worth the effort, and often a necessity.


Besides the indexes mentioned here and in Chapter 12, there is also an index that supports explicit ordering, known as the sort index. Since this works differently ...

Get eXist now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.