1.1 The Course of This Course 5
template for a receipt record. The boilerplate store information would be easy to
add; the date and time would change but would be easy to obtain from an internal
clock; and the only complexity comes from the fact that one has a variable number
of items to record.
This kind of structured data is very diﬀerent from a web page or a collection
of web pages forming an entire website, for example, which would have free text,
links, internal horizontal references, etc. In a retail store, one would assume that
the list of items for sale could be found somewhere in a ﬁle on a master computer.
For a randomly chosen website, though, no master list of the words used on the site
would be expected to exist. That list of words would have to be built dynamically by
reading through the pages. With a retail transaction, we expect header and footer
information and a list of items and prices in the middle. With the pages of a website,
we have no reason to know what to expect in terms of the internal and external
URL references. We have no reason to know the length of the pages or where the
content will occur. If we are to build a data structure to hold that information, we
must use a structure that can adapt automatically as we read the data.
1.1.6 Dynamic Versus Static Behavior
As soon as one introduces the notion of change to a data ﬁle (or more generally to
a data object) and sets about to decide upon a data structure for that ﬁle, one is
forced to consider the operational characteristics of the dynamic activity on that
data. The questions that need to be addressed include the following:
• What is the size of the data object?
• What is the size of the object relative to the ongoing change to the object?
• What happens to data items when they are “deleted” from the collection of
With the data ﬁle of IRS records on U.S. taxpayers, for example, it is probably
a reasonable assumption that the base ﬁle is enormous relative to the daily changes
to the ﬁle. There are hundreds of millions of taxpayer IDs, and no doubt billions
of individual records of tax payments, but the rate of change of the data object
is probably relatively small. If the data object consists of the active processes in a
computer, however, the number of processes will be small (in the low hundreds at
most), but a large number of the processes will be very short lived and the turnover
will be very high. Similarly, the servers of an ISP will have at any time a collection
of individual packets of Internet transmissions, but the set of packets at any given
time will be constantly changing. Diﬀerent rates of change will require diﬀerent