O'Reilly logo

Computer Science by Ian Sinclair

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

File processing and management
have to be kept in any order; it can be a straightforward random-access file.
Hashing methods generally have become the topic of intense study, and
there are several books devoted entirely to the subject, though many of these
are aimed at theorists rather than at programmers.
Btree methods
The Btree method is useful when a very large amount of data is held on a
disk, and a record has to be retrieved in the least possible number of disk
read actions. If a file can be held in memory and is sorted into order, the
binary search is very much quicker, because the comparisons take very little
time to perform, and no additional records or index items have to be fetched
from the disk. If, however, the number of records or index items is very
large, too large to fetch into memory, then a locating method which makes
the least possible number of disk reads is preferable. If a binary search is
conducted from disk, there will be an item read for each comparison. The
Btree method uses each disk read to fetch a larger number of items, since a
search through items, once read, will be much faster than the action of disk
reading.
In a Btree of order 5, there will be a maximum of five items at each disk
read, and these items will be sorted into order. The first set of items
constitutes the root, and might contain only one number. The first com-
parison will determine whether the wanted item is equal to this number, in
which case the data is fetched, or is greater than this number or less than this
number. The decision will lead to another disk read, this time of up to five
numbers in this example, and the comparison can be tried on all five. Once
again, if a match is not found on any number, another fetch will result in a
group of numbers from which comparisons can now be made. Because each
read action produces a larger set of numbers, the amount of time spent on
disk reading is much less than would be required for a binary search.
The algorithm for a Btree, however, can be very complex, and most
programmers will prefer to buy the routine on disk rather than develop it
single-handed. This requires the use of a modular programming language
such as C, Pascal or Modula-2 or some of the most recent versions of
BASIC, but this does not mean that a Btree routine will necessarily be
available for programmers in other languages. In general, C and Pascal
programmers are best served for libraries of these routines.
Exercise 1
Try a hashing algorithm for a list of twenty names, using a sum of
ASCII codes. Any number that exceeds fifty should have fifty sub-
tracted until the result is less than fifty. Are there any collisions in your
list?
109

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required