DTD-Directed Publishing with Attribute Translation Grammars
Michael Benedikt Chee Yong Chan Wenfei Fan Rajeev Rastogi
Bell Laboratories, Lucent Technologies
{benedikt, cychan, wenf ei, rastogi }@research. bel i - labs. com
Shihui Zheng Aoying Zhou
Fudan University
{shzhengO, ayzhou}@fudan, edu. cn
Abstract
We present a framework for publishing relational
data in XML with respect to a fixed DTD. In data
exchange on the Web, XML views of relational
data are typically required to conform to a prede-
fined DTD. The presence of recursion in a DTD as
well as non-determinism makes it challenging to
generate DTD-directed, efficient transformations.
Our framework provides a language for defining
views that are guaranteed to be DTD-conformant,
as well as middleware for evaluating these views.
It is based on a novel notion of
attribute transla-
tion grammars (ATGs).
An ATG extends a DTD
by associating semantic rules via SQL queries.
Directed by the DTD, it extracts data from a re-
lational database, and constructs an XML docu-
ment. We provide algorithms for efficiently eval-
uating ATGs, along with methods for statically an-
alyzing them. This yields a systematic and effec-
tive approach to publishing data with respect to a
predefined DTD.
1 Introduction
XML [6] has become the primary standard for data ex-
change on the Web. To exchange data currently residing
in relational databases, one needs to
publish it in XML,
i.e.
to transform the data into an XML format. In practice, pub-
lishing of relational data is always done with a predefined
type, typically a DTD. A community or industry agrees on
a certain DTD, and subsequently all members of the com-
munity create XML views of their relational data that con-
form to the DTD [3]. This is common in, e.g., B2B ap-
plications and the health-care industry: a hospital needs to
Permission to copy ~Wthout.[ee all or" part o./this material is granted pro-
vMed that the copies are not made or distributed./br direct commercial
advantage, the VLDB copyright notice and the title o/the publication and
its date appeal; and notice is given that copying is by permission o/ the
Veto' Large Data Base Endou'ment. To cop.v other,ise, or to republish.
requires a.fi, e and~or special permission./~vm the Emlowment.
Proceedings of the 28th VLDB Conference,
Hong Kong, China, 2002
extract patient information from its relational store, convert
it to an XML format, and send it to an insurance company,
with the XML data generated conforming to a DTD defined
by the insurance company.
The problem can be stated as follows: given a DTD D
and a relational schema R, define a view o such that for any
instance I of R, o(I) is an XML document that conforms
to D. We refer to this as
DTD directed publishing.
The
goal is to provide a DTD-directed publishing system that
captures transformations commonly found in practice.
DTD-directed publishing is rather challenging. The
presence of disjunction in a DTD leads to difficulties in
defining deterministic mappings based on the DTD, while
recursion makes for a poor match with the querying fa-
cilities of standard relational databases. Recursive DTDs
are commonly found in specifications of biomedical [5],
protein [20] and chemical data [9], e.g., DNA is specified
in terms of clone, clone has subelements gene and DNA,
while gene is in turn specified with DNA. As a simple ex-
ample, let us consider a mild variation of a fragment of
the TPC-H relational schema [24] shown in Fig. 1 (with
keys underlined). The schema, referred to as Ro, speci-
fies parts, suppliers of those parts, and the composition of
a part from other parts. Suppose that one wants to define
an XML view that extracts information about parts with the
brand "Acme" from the relational database. For each part
the view provides the name, suppliers and moreover, the
part-hierarchy composing it: its sub-parts, the sub-parts of
those sub-parts, and so on. In addition, the XML docu-
ment generated is to conform to a DTD Do given in Fig. 2
(here we omit the description of elements whose type is
PCDATA). Observe that Do is recursive : part is defined in
terms of itself. Moreover, the structure of the
address
is
non-deterministic: if the supplier is "domestic", i.e., based
in the US, its address is simply the addr attribute of the
Supplier relation; otherwise, i.e., if it is "foreign", its
address consists of the addr attribute and its nation.
Given an instance of Ro, the goal is to generate an XML
document of DTD Do. In the document, parts are nested to
an arbitrary level which is not known at compile time, but
is rather data-driven, i.e. determined by the relational data.
This is an instance of DTD-directed publishing.
838

Get Proceedings 2002 VLDB Conference now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.