Translating Web Data
Lucian Popat
Yannis Velegrakist
Ren~e J. Miller*
Mauricio A. HernAndezt
tIBM Almaden Research Center
650 Harry Road
San Jose, CA 95120
tUniversity of Toronto
6 King's College Road
Toronto, ON M5S 3H5
Ronald Fagint
We present a novel framework for mapping
between any combination of XML and rela-
tional schemas, in which a high-level, user-
specified mapping is translated into semanti-
cally meaningful queries that transform source
data into the target representation. Our ap-
proach works in two phases. In the first phase,
the high-level mapping, expressed as a set
of inter-schema correspondences, is converted
into a set of mappings that capture the design
choices made in the source and target schemas
(including their hierarchical organization as
well as their nested referential constraints).
The second phase translates these mappings
into queries over the source schemas that pro-
duce data satisfying the constraints and struc-
ture of the target schema, and preserving the
semantic relationships of the source. Non-
null target values may need to be invented in
this process. The mapping algorithm is com-
plete in that it produces all mappings that are
consistent with the schema constraints. We
have implemented the translation algorithm in
Clio, a schema mapping tool, and present our
experience using Clio on several real schemas.
1 Introduction
An important issue in modern information systems
and e-commerce applications is providing support
for inter-operability of independent data sources. A
broad variety of data is available on the Web in dis-
tinct heterogeneous sources, stored under different
formats: database formats (e.g., relational model),
Permission to copy without fee all or part of this material is
granted provided that the copies are not made or distributed for
direct commercial advantage, the VLDB copyright notice and
the title of the publication and its date appear, and notice is
given that copying is by permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires a .fee
and/or special permission .from the Endowment.
Proceedings of the 28th VLDB Conference,
Hong Kong, China, 2002
semi-structured formats (e.g., DTDs, SGML or XML
Schema), scientific formats, etc. Integration of such
data is an increasingly important problem. Nonethe-
less, the effort involved in such integration, in practice,
is considerable: translation of data from one format
(or schema) to another requires writing and managing
complex data transformation programs or queries.
We present a new, comprehensive solution to build-
ing, refining and managing transformations between
heterogeneous schemas. Given the prevalent use of
the Web for data exchange, any data translation tool
must handle not only relational data but also data
represented in nested data models that are commonly
available on theWeb. Our solutions are applicable to
any structured and semi-structured data that can be
described by a schema (a relational schema, a nested
XML Schema or DTD). We do not, in this work, con-
sider the exchange of documents or unstructured data
(including multimedia and unstructured text). Our
approach can be distinguished by its treatment of two
fundamental issues. We discuss them below, highlight-
ing our contributions and the main related work.
Heterogeneous Semantics We consider the
problem, where we are given a pair of inde-
pendently created schemas and asked to translate data
from one (the source) to the other (the target). The
schemas may have different semantics, and this may
be reflected in differences in their logical structures
and constraints. In contrast, most work on heteroge-
neous data focuses on the
schema integration
where the target (global) schema is created from one
or more source (local) schemas (and designed as a view
over the sources) [8]. 1 The target is created to reflect
the semantics of the source and has no independent
semantics of its own. Even our own earlier work on
schema mapping considered the problem of mapping
a source schema (with a rich logical structure) into a
flat (single table) target schema with no constraints,
thus ignoring half of the more general problem [12]. In
contrast, Section 2 gives a
semantic translation
rithm that preserves semantic relationships during the
translation from source to target, where the source and
m Alternatively, in a local-as-view approach each source
schema is modeled as a view on the target schema [8].

Get Proceedings 2002 VLDB Conference now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.