File processing and management

have to be kept in any order; it can be a straightforward random-access file.

Hashing methods generally have become the topic of intense study, and

there are several books devoted entirely to the subject, though many of these

are aimed at theorists rather than at programmers.

Btree methods

The Btree method is useful when a very large amount of data is held on a

disk, and a record has to be retrieved in the least possible number of disk

read actions. If a file can be held in memory and is sorted into order, the

binary search is very much quicker, because the comparisons take very little

time to perform, and no additional records or index items have to be fetched

from the disk. If, however, the number of records or index items is very

large, too large to fetch into memory, then a locating method which makes

the least possible number of disk reads is preferable. If a binary search is

conducted from disk, there will be an item read for each comparison. The

Btree method uses each disk read to fetch a larger number of items, since a

search through items, once read, will be much faster than the action of disk

reading.

In a Btree of order 5, there will be a maximum of five items at each disk

read, and these items will be sorted into order. The first set of items

constitutes the root, and might contain only one number. The first com-

parison will determine whether the wanted item is equal to this number, in

which case the data is fetched, or is greater than this number or less than this

number. The decision will lead to another disk read, this time of up to five

numbers in this example, and the comparison can be tried on all five. Once

again, if a match is not found on any number, another fetch will result in a

group of numbers from which comparisons can now be made. Because each

read action produces a larger set of numbers, the amount of time spent on

disk reading is much less than would be required for a binary search.

The algorithm for a Btree, however, can be very complex, and most

programmers will prefer to buy the routine on disk rather than develop it

single-handed. This requires the use of a modular programming language

such as C, Pascal or Modula-2 or some of the most recent versions of

BASIC, but this does not mean that a Btree routine will necessarily be

available for programmers in other languages. In general, C and Pascal

programmers are best served for libraries of these routines.

Exercise 1

Try a hashing algorithm for a list of twenty names, using a sum of

ASCII codes. Any number that exceeds fifty should have fifty sub-

tracted until the result is less than fifty. Are there any collisions in your

list?

109

Start Free Trial

No credit card required