Processing Star Queries on Hierarchically-Clustered
Fact Tables
Nikos Karayannidis l, Aris Tsois l, Timos Sellis 1 , Roland Pieringer 2, Volker Markl 4,
Frank Ramsak3,Robert Fenk 3, Klaus Elhardt 2, Rudolf Bayer 5
Institute of Communication and Computer Systems and Department of Electrical and Computer
Engineering, National Technical University of Athens, Zographou 15773 Athens, Hellas
{nikos, atsois, timos } @dblab.ece.ntua.gr
2 TransAction Software GmbH Gustav-Heinemann-Ring 109,D-81739 Mtinchen, Germany
{pieringer, elhardt} @transaction.de
3 Bayerisches Forschungszentrum ftir Wissensbasierte Systeme, Orleansstrage 34, D- 81667 Miinchen,
Germany, {robert.fenk, frank.ramsak} @forwiss.de
4 IBM Almaden Research Center, K55/B 1,650 Harry Road, San Jose, CA 95120-6099, marklv@us.ibm.com
5 Institut ffir lnforlnatik, TU-Mfinchen, Orleansstrage 34, D-81667 Mfinchen, Gerlnany, bayer@in.tum.de
Abstract
Star queries are the most prevalent kind of que-
ries in data warehousing, OLAP and business in-
telligence applications. Thus, there is an impera-
tive need for efficiently processing star queries.
To this end, a new class of fact table organiza-
tions has emerged that exploits path-based surro-
gate keys in order to hierarchically cluster the
fact table data of a star schema [DRSN98,
MRB99, KS01]. In the context of these new or-
ganizations, star query processing changes radi-
cally. In this paper, we present a complete ab-
stract processing plan that captures all the neces-
sary steps in evaluating such queries over hierar-
chically clustered fact tables. Furthermore, we
present optimizations for surrogate key process-
ing and a novel early grouping transformation for
grouping on the dimension hierarchies. Our algo-
rithms have been already implemented in a
commercial relational database management sys-
tem (RDBMS) and the experimental evaluation,
as well as customer feedback, indicates speed-
ups of orders of magnitude for typical star que-
ries in real world applications.
Permission to copy without joe all or part of this" material is" granted
provided that the copies are not made or distributed for direct commer-
cial advantage, the VLDB copyright notice and the title of the publica-
tion and its date appear, and notice is" given that copying is" by permis-
sion of the Very Large Data Base Endowment. To copy otherwise, or to
republish, requires a fee and~or special permission j?om the Endowment
Proceedings of the 28 th VLDB Conference,
Hong Kong, China, 2002
1. Introduction
Data warehousing (DW) has evolved into a major trend in
database technology through the last decade. Furthermore,
the multidimensional paradigm seems to be the undis-
puted winner as a design choice for such databases. Re-
gardless of the underlying physical layer, relational tech-
nology or proprietary multidimensional structures, the
conceptual model adopted is a data warehouse consisting
of facts (or measures) organized into a set of dimensions,
which in turn are organized into levels of different aggre-
gation (i.e., detail) that comprise one or more hierarchies.
In particular, for relational databases, the multidimen-
sional data warehouse consists of one or more
star sche-
mata
[CD97a].
The information stored in a star schema is in the form
of detailed facts organized by dimension values and can
produce all kinds of invaluable insights with appropriate
querying. The most prevalent kind of query submitted to
such a system is the
star query.
Star queries impose re-
strictions on the dimension values that are used for select-
ing specific facts; these facts are further grouped and ag-
gregated according to the user demands. The major bot-
tleneck in evaluating such queries has been the join of the
central (and usually very large) fact table with the sur-
rounding dimension tables (also known as a
star join).
To
cope with this problem various indexing schemes have
been developed [NG95, NQ97, Sat97, C198, WB98,
Wu99, WOS01]. Also precomputation of aggregation
results has been studied extensively - mainly as a view
maintenance problem - and is used as a means of acceler-
ating query performance in data warehouses [GM95,
Rou98, SDJL96].
730

Get Proceedings 2002 VLDB Conference now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.