The Biggest Database In the World

Probably the largest database in the world today is the collection of web pages on the Internet. While much of the Web is filled with pornographic images, magazine articles, and product advertisements, there is a staggering amount of personal information as well: individual home pages, email messages, and postings to the Usenet. This record can be automatically searched for revealing disclosures, unintentional admissions of guilt, or other kinds of potentially valuable information.

Back before the explosive growth of the World Wide Web , Rick Gates, a student and lecturer at the University of Arizona, was interested in exploring the limits of the Internet database. In September 1992, he created the Internet Hunt, a monthly scavenger hunt for information on the Net. Early hunts had the participants locate satellite weather photographs or the text to White House speeches. The hunt was especially popular among librarians, who were at the time trying to make the case that the Internet could be a valuable reference tool.

In June 1993, Gates decided to have a different kind of hunt. It was the first where the goal was simply to find as much information as possible about the person behind an email address.

In one week the hunt's 32 teams eventually discovered 148 different pieces of information about the life of Ross Stapleton.[16] A computer at the University of Michigan reported that Stapleton had B.A. degrees in Russian Language and Literature and Computer Science. A computer at the University of Arizona reported that he had a Ph.D. in Management Information Systems. A computer operated by the U.S. Military's Defense Data Network (DDN) Network Information Center divulged Stapleton's current and previous addresses and phone numbers. And a brochure on a Gopher server operated by the Computer Professionals for Social Responsibility reported that Stapleton was one of the conference's speakers—and that he was an analyst in the Office of Scientific and Weapons Research at the U.S. Central Intelligence Agency.

But the most revealing information the group assembled came from statements Stapleton himself had made. By scanning messages he had sent to the COM-PRIV mailing list—ironically, a mailing list devoted to privacy issues—the group learned that Stapleton used the OS/2 operating system and didn't have a fax machine. They learned that he was also affiliated with Georgetown University, where he was an adjunct professor and taught courses on the Information Age. They discovered that Stapleton subscribed to the Arlington Journal, the Chronicle of Higher Education, and Prodigy. He was a member of the AAASS (American Association for the Advancement of Slavic Studies). His Cleveland Freenet Membership number was #ak287.

From the dedication in Stapleton's thesis dissertation, "Personal Computing in the CEMA Community," the hunters discovered that Stapleton's parents were named Tom and Shirle. From the heading of another mail message he sent, they discovered that he was engaged, and that his fiancée's name was Sarah Gray. Transcripts of Stapleton's comments at the Second Conference on Computers, Freedom, and Privacy were also unearthed.[17]

"Stepping back a bit and taking the hunt results as a whole, one can see that there's an awful lot of information that can be found on someone, even when restricted to freely accessible, publicly available Nets," said organizer Rick Gates in his report on the hunt. "I hope that people keep that in mind when they are posting to an email listserv or newsgroup. They are really adding to the sum total of the Nets, and what they have to say in some limited discussion of an [obscure] topic may be around for a long time."

An odd side effect of the global database is that it is easier to seek out information on people who have unique or unusual names. For instance, I tried searching the Internet in February 1998 for the phrase "Tom and Shirle." HotBot, an Internet search engine, found the word "Tom" on 1,833,334 pages and the word "And" on 63,502,825 pages. But the word "Shirle" was on just 333 pages, and the phrase "Tom and Shirle" was on six pages—all of which, it turns out, were copies of Gates's June 1993 report.

"I was pleasantly surprised to see the amount of information that I myself put out that they managed to find," said Stapleton when I interviewed him for this chapter. "Nothing came out during the hunt that I would have said alarmed me." But Stapleton had been worried that somebody at the CIA might be angry that he had revealed his name and employer in so many public forums. "It was only going to be a matter of time before somebody at work said, 'Hey, what have you been doing?'"

Perhaps what's most remarkable about the June 1993 Internet Hunt is that it no longer seems remarkable that such a detailed profile of a person could be constructed from publicly available sources. The explosion of online information sources, combined with advertiser-supported search-and-retrieval services like Yahoo, Lycos, and AltaVista, have made it possible to easily assemble these kinds of detailed profiles. Indeed, several services, such as DejaNews and HotBot, specifically advertise this ability.

