Searching (and Indexing) in General

In the exercises for this chapter, you will be addressing the subject of searching for data. A considerable part of the discussion will relate to indexing names of data files. You may wonder why. The discussion below will explain the rudiments of indexing (which is basically sorting) and why it is important.

A computer can easily search a list of items for a particular item. It simply checks each item in the list against the “search string,” and when it finds the two are equal, it declares success. For a trivial example, suppose we had the list of strings, each consisting of three characters and each associated with a number:

GHU  2343
UCK  7765
OPR  8828
PRO  1234
ZYX  7876
QWA  9500
ASD  3456

If we wanted to find the number associated with QWA, we could program the computer to compare the query (text string) QWA with the first string in the list: GHU. No match, so we go on to the next string (UCK). Again, no match. Finally, we would come to QWA in the list and get our answer: 9500. As you can probably guess, on average we would have to look through half the list to find a given string. In this case, it took six comparisons before the string was matched.

However, if we sort the list alphabetically (carrying along the associated number) we would have:

ASD 3456
GHU 2343
OPR 8828
PRO 1234
QWA 9500
UCK 7765
ZYX 7876

Now we can apply a clever strategy. We look first at the middle of the list: PRO. It doesn’t match QUA, but we know that QUA is alphabetically ...

Get Introducing Geographic Information Systems with ArcGIS: A Workbook Approach to Learning GIS, 3rd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.