FAS a Freshness-Sensitive Coordination Middleware
for a Cluster of OLAP Components
Uwe R6hm
Klemens B6hm*
Hans-J6rg Schek
Swiss Federal Institute of Technology
ETH Zentrum, 8092 Zurich, Switzerland
{ roehm,boehm,sche k,schuldt } @ in f.ethz.ch
Heiko Schuldt
1 Introduction
Data warehouses offer a compromise be-
tween freshness of data and query evalua-
tion times. However, a fixed preference ra-
tio between these two variables is too undif-
ferentiated. With our approach, clients sub-
mit a query together with an explicit
ness limit
as a new Quality-of-Service pa-
rameter. Our architecture is a cluster of
databases. The contribution of this article
is the design, implementation, and evalua-
tion of a coordination middleware. It sched-
ules and routes updates and queries to clus-
ter nodes, aiming at a high throughput of
OLAP queries. The core of the middleware
is a new protocol called FAS
Aware Scheduling)
with the following qual-
itative characteristics: (1) The requested
freshness limit of queries is always met,
and (2) data accessed within a transaction
is consistent, independent of its freshness.
Our evaluation shows that FAS has the
following nice properties: OLAP query-
evaluation times are close (within 10%) to
the ones of an idealistic setup with no up-
dates. FAS allows to effectively trade 'up-
to-dateness' for query performance. Even
when all queries request fresh data, FAS
clearly outperforms synchronous replica-
tion. Finally, mean response times are inde-
pendent of the cluster size (up to 128 nodes).
* Current affiliation: Otto-von-Guericke-Universit~it Magde-
burg, Germany
Data warehouses are closely tied to OLAP, i.e., on-
line analytical processing of the vast amount of data
of an organization. They typically offer a compro-
mise between freshness of data and warehouse main-
tenance costs. Different application scenarios and
users however have different preferences in this re-
spect, and a fixed preference ratio is too undifferen-
tiated. With our approach, clients submit a query
together with an explicit
freshness limit
as a new
Quality-of-Service parameter. In other words, read-
ers may decide infinitely variable how much up-to-
date the data accessed should be. The goal is to
use this additional information to improve through-
put. The concern of this article is the through-
put of a stream of OLAP queries, i.e., we assume
a read-mostly environment with many concurrent
readers. This complements recent work on replica-
tion in OLTP scenarios, e.g., [6, 17].
The object of this study is a cluster of databases
[18, 14]: this is a cluster of commodity computers,
each node running an off-the-shelf database manage-
ment system as transactional storage layer. This pa-
per assumes that all cluster nodes are homogeneous,
i.e., they run the same DBMS with the same database
schema. Each node holds a full copy of the database,
but the freshness of these copies may vary between
cluster nodes. Finally, we assume that there is a
coordination middleware layer on top of the clus-
ter (cf. Figure 1). Clients submit query or update
transactions to this middleware, instead of directly
communicating with specific cluster nodes. The mid-
dleware schedules and routes updates and queries to
cluster nodes. The
generates a correct in-
terleaved execution order. In general, scheduling al-
lows for several cluster nodes where a query may ex-
ecute. The
chooses one of these nodes for each
is (1) to achieve high performance with regard to the
