1.2 Examples 11
unique to packet traﬃc is that what the user views as a single transmission is
broken into packets at the source to be sent, possibly through diﬀerent routes,
to the destination. The packets arrive at the destination in no particular order
and then must be reassembled in the proper order before the complete message is
delivered to the destination user.
This kind of data is diﬀerent from all those mentioned previously in that the
data records (packets) are not standalone items but must be aggregated into com-
plete transmissions in real time at high speed. Student records are expected to be
persistent and long term, but Internet messages are packetized, sent, collected, and
then discarded. This kind of processing requires a much diﬀerent data structure
from that used previously in order to accommodate these diﬀerent needs.
1.2.5 Process Queues
Similar in some ways to Internet packet traﬃc are the process queues managed by
the operating system in any modern computer. In even a standard desktop there
will be dozens of processes in a state of execution at any point in time. Some of
these are processes that are always running; some are processes created when the
user ﬁres up a word processor, an edit window, an Internet browser, and so forth.
The task of the scheduler for the operating system is to determine which process to
execute next. As with packet traﬃc, these process queues must be managed in real
time with as little eﬀort as possible expended by the scheduler. Processes need to
have their priorities updated at intervals, and the eﬀects of the updated priorities
need to be available immediately. The data structure used to implement the process
queues needs to be highly dynamic and easily changed, and the processes with the
highest priorities need to be quickly accessible.
All the preceding examples of data are of fixed-format records. These are records
that can be thought of in spreadsheet form; each record has a list of ﬁelds, and
each ﬁeld has a speciﬁc data type. Although some ﬁelds might be of variable
length, the sequence of the ﬁelds and the data type for each ﬁeld are the same
across all records. Further, there is some reason to believe that the data as stored
will actually make sense because the records have been created by an authoritative
source or by means of software that has presumably been written so as to produce
reasonably correct records.
All that goes out the window when one considers the Google problem of analyzing
all the web pages in the world. Web pages are not ﬁxed-format; they are highly
changeable over time; they are not created by an authoritative source; they may
be rife with misspellings or typographical errors; and they are not guaranteed to