Optimizing View Queries in ROLEX
to Support Navigable Result Trees
P. Bohannon S. Ganguly H. E Korth P.P.S. Narayan P. Shenoy
Lucent Technologies- Bell Laboratories
600 Mountain Avenue
Murray Hill, NJ 07974 USA
{bohannon,sganguly,hfk,ppsnarayan} @lucent.com
pshenoy @ cs.washington.edu
Abstract
An increasing number of applications use
XML
data published from relational databases. For
speed and convenience, such applications rou-
tinely cache this
XML
data locally and access
it through standard navigational interfaces such
as DOM, sacrificing the consistency and integrity
guarantees provided by a
DBMS
for speed. The
ROLEX
system is being built to extend the capabil-
ities of relational database systems to deliver fast,
consistent and navigable
XML
views of relational
data to an application via a
virtual DOM
interface.
This interface translates navigation operations on
a
DOM
tree into execution-plan actions, allowing a
spectrum of possibilities for lazy materialization.
The
ROLEX
query optimizer uses a characteriza-
tion of the navigation behavior of an application,
and optimizes view queries to minimize the ex-
pected cost of that navigation. This paper presents
the architecture of ROLEX, including its model
of query execution and the query optimizer. We
demonstrate with a performance study the advan-
tages of the
ROLEX
approach and the importance
of optimizing query execution for navigation.
1 Introduction
XML
has gained widespread popularity as a standard for in-
formation representation and exchange. Infrastructure soft-
ware for business hubs, supply-chain integration, and cata-
log management all use
XML
encodings. Standards bod-
ies for business data exchange, such as RosettaNet [21]
Permission to copy without fee all or part of this material is granted pro-
vided that the copies are not made or distributed for direct commercial
advantage, the VLDB copyright notice and the title of the publication and
its date appear, and notice is g&en that copying is by permission of the
Very Large Data Base Endowment. To copy otherwise, or to republish,
requires a fee and~or special permission from the Endowment.
Proceedings of the 28th VLDB Conference,
Hong Kong, China, 2002
and Oasis-Open [18], are extremely active. The result is
a tremendous focus on incorporating support for
XML
in
application-development and data-management tools.
In some cases, an XML-based application may be de-
veloped from scratch, and perhaps require a storage facil-
ity for
XML
documents [3, 25]. However, in most cases
the XML-based application must
intemperate
with exist-
ing SQL-centric applications. In the typical "shred-and-
publish" approach to interoperation, incoming
XML
data is
parsed (shredded) into relational tables and outgoing data
is extracted by SQL engines and then formatted (published)
as XML.
For example, a database supporting
an SQL-based
hotel-reservation application may also be called on to sup-
port a web-site, or to exchange
XML
with a third party
"hub" for the travel industry.
Maintaining the mapping between the relational data
source and the associated
XML
documents is complex and
error-prone. Fortunately, recently-developed middleware
systems for
XML
publishing [5, 8] greatly ease this task
by providing a declarative language in which a
view query
specifies the desired mapping. The view query is trans-
lated by the middleware into one or more SQL queries for
execution on the underlying
DBMS,
and a
tagger
process
constructs an
XML
document from the result. Furthermore,
commercial relational and object-relational databases are
becoming "XML-enabled" [20] by integrating certain mid-
dleware functionality into the
DBMS.
This may entail, for
example, supporting a modified SQL syntax that outputs
XML
or allowing xPath queries against an
XML
view of the
database.
1.1 Caching and the "Back-Room" DBMS
Application-caching of database data is widespread, partic-
ularly in the web-facing applications that
XML
middleware
systems are designed to support. Data is cached primarily
for performance, and an experimental study by Labrinidis
and Roussopoulos [ 16] of caching web data both in and out
of the
DBMS
illustrates the problem. In almost every ex-
periment, caching outside the
DBMS
offered two
orders of
magnitude
better performance than caching within.
119

Get Proceedings 2002 VLDB Conference now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.