6.5 In-Network Aggregation 203
4. Scalability in network size: As the number of nodes increases, the
system’s total storage capacity should increase, and the commu-
nication cost of the system should not grow unduly.
5. Load balancing: Storage should not unduly burden any one
node. Nor should any node become a concentration point of
communication.
6. Topological generality: The database architecture should work well
on a broad range of network topologies.
6.5 In-Network Aggregation
Let us now see how in-network query processing can be used to pro-
vide substantial energy savings when serving aggregate queries. This
savings is possible because combining data at intermediate nodes
reduces the overall number of messages the network has to trans-
mit, thus reducing communication and prolonging the lifetime of
the network.
6.5.1 Query Propagation and Aggregation
Consider the following simple example, where an average reading
is computed over a network of six nodes arranged in a three-level
routing tree (Figure 6.2). In the server-based approach, where the
aggregation occurs at an external server, each sensor sends its data
directly to the server. This requires a total of 16 message trans-
missions. Alternatively, each sensor may compute a partial state
record, consisting of {sum, count}, based on its data and that of its
children, if there are any. This requires a total of only six message
transmissions.
1
In-network aggregation and query processing typically involve
query propagation (or distribution) and data aggregation (or collec-
tion). To push a query to every node in a network, an efficient routing
1 We assume each partial state record can fit into a single message.
204 Chapter 6 Sensor Network Databases
1
2
3
4
(a) (b)
33
c
a
d
f(c,d,f(a,b))
f(a,b)
Figure 6.2 External server (a) versus in-network aggregation (b). In (a), the number at each node
denotes the number of hops away from the external server (adapted from [149]).
structure has to be established. For example, a routing tree rooted at
a base station could be used for this purpose. A query may be prop-
agated through the routing structure using a broadcast mechanism
(i.e., flooding the network). Or it may use multicast to reach only
those nodes that may contribute to the query. For example, if the
having-predicate specifies a geographic region, then portions of the
routing structure that do not satisfy the having-predicates may be
omitted from propagating the query.
Once a query has been distributed to all the nodes that satisfy
the query conditions, data is then collected and aggregated within
the network, utilizing the same routing structure. This gives rise to
a number of important issues. Which aggregates can be computed
piecewise and then combined incrementally? How should the activ-
ities of listening, processing, and transmitting be scheduled to
minimize the communication overhead and reduce latency? How
does the aggregation adapt to changing network structure and lossy
communication? As discussed earlier, depending on whether a query
is a snapshot, historical, or continuous, the aggregations are done
over time windows of different sizes. For a snapshot query such as
“report the current maximum temperature in a region,” the aggre-
gation needs to be completed within a small window of time within
which the temperature is not likely to vary significantly. For other
6.5 In-Network Aggregation 205
applications, the nodes must time-synchronize more accurately in
order to correctly return the result. An example is the counting of
moving vehicles on a highway. The vehicle movement may be such
that different sensors may double count, or undercount, the vehicles
if detections are not time stamped accurately. One major difference
from a traditional database system is that for long-running queries
in monitoring applications, a sensor network database returns a
stream of aggregates, one per sampling period. A key challenge for
in-network aggregation is the design of an optimal data aggregation
schedule that is energy- and time-efficient.
6.5.2 TinyDB Query Processing
As an example, we consider TinyDB, a system designed to support
in-network aggregate query processing [149]. TinyDB provides an
SQL-style declarative query interface, and implements aggregation
mechanisms that are sensitive to resource constraints and lossy com-
munication. An example of TinyDB queries was given earlier in
Section 6.2.
TinyDB supports five SQL operators: count, min, max, sum, average,
and two extensions, median and histogram. Aggregation is imple-
mented via a merging function f , an initializer i, and an evaluator e.
The merging function f computes:
z=f (x, y),
where x and y are multivalued partial state records. For exam-
ple, for average, a partial state record S, C consists of sum and
count of the sensor values it represents. The quantity z is the
resulting partial state record, summarizing the data represented by
x and y:
f (S
1
, C
1
, S
2
, C
2
) =S
1
+ S
2
, C
1
+ C
2
.
The initializer i specifies the construction of a state record from a sin-
gle sensor value. For average, this is i(x) =x,1. Finally, the evaluator

Get Wireless Sensor Networks now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.