Effective Data Warehouse Indexing
This chapter is divided into three distinct parts. The ﬁrst part examines the
basics of indexing, including different types of available indexes. The sec-
ond part of this chapter attempts to prove the usefulness, or otherwise, of
bitmap indexes, bitmap join indexes, star queries, and star transformations.
Lastly, this chapter brieﬂy examines the use of index organized tables
(IOTs) and clusters in data warehouses.
3.1 The Basics of Indexing
An index in a database is essentially a copy of a small physical chunk of a
table. That index is then used to access the table, because reading the index
reads a small portion of what would be read from the table, resulting in
much less physical I/O activity. Note that it is sometimes faster to read the
entire table rather than both index and table. Sometimes it is best not to
create indexes for a speciﬁc table at all, such as a table containing a small
amount of static data.
Indexes are not always required, and too many indexes can some-
times even hurt performance.
OLTP and transactional databases generally require changes to small
amounts of data and beneﬁt greatly from extensive use of precise indexing.
Data warehouses often get data from entire tables. Sometimes indexes can be
superﬂuous. However, there are some very speciﬁc types of indexes that are
useful in data warehouses in particular, such as bitmaps, clusters, and IOTs.
Bitmaps, IOTs, and clusters are not amenable to data changes but are
designed to be highly efﬁcient for high-throughput reads common to data
The Basics of Indexing
warehouses. BTree indexes are efﬁcient for both reading and updating of
data, best used for exact hits and range scans common to OLTP databases.
Non-BTree indexes are vulnerable to overﬂow when updated. Bitmap
indexes are particularly vulnerable to overﬂow and catastrophic degradation
in performance over long periods of update activity.
What exactly index overﬂow is and why it is a performance hin-
drance will be explained later in this chapter.
Remember one more thing. The more indexes that are created on a
table, the more updates occur when a table row is changed. For example, if
a single table has ﬁve indexes, then a single row insertion involves one table
row insertion and ﬁve index row insertions—that is six insertions alto-
gether. That is not good for performance!
3.1.1 The When and What of Indexing
It has already been mentioned that one does not always need to create an
index for every circumstance that might require an index. What are some spe-
ciﬁc examples where an index might actually hinder database performance?
If a table contains a small number of rows the optimizer is likely to
perform a full table scan on that table, unless in a highly complex
join. In this case any indexing might be ignored.
Data warehouses are more often than not I/O intensive. This is
because they read either large portions of tables or entire tables at
In the past many books have stated that the optimizer will generally
execute a full table scan by default when more that 10% of a full table is read.
On the contrary, the 10% marker will not always be the case, especially since
the optimizer gets better with every new version of Oracle Database.
Sometimes in a data warehouse, unless an index is helping in sort-
ing of returned rows, one should consider the implications of includ-
ing unnecessary indexing. Always check the physical order in which
data warehouses create fact tables. Utilities such as SQL*Loader can
append rows to the end of ﬁles regardless of existing usable block
Get Oracle Data Warehouse Tuning for 10g now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.