Database Nation

Chapter 4. What Did You Do Today?

When I was teenager, I tried keeping a diary. I took out my pen every night before I went to sleep and wrote down the details of the previous day. I had just started dating and soon the book’s pages were filled with stories of my teenage romances: I’d write down who I liked and who I didn’t; who I had seen at school and who I had talked to on the phone. And, of course, I wrote down the details of my dates themselves: who they were with, where we had gone, what we had eaten, and what we had done.

After a month or so I had created quite an impressive historical record of my teenage exploits. But as time passed, my entries started getting shorter and shorter. It was just too much work to write down all of the details. Ultimately, my project collapsed under the weight of its own data.

Keeping that diary in today’s world would be much easier. Every time I buy something with a credit card, I get back a little yellow slip telling me the exact time and location of my purchase. I get a much more detailed receipt at my neighborhood supermarket that lists the name and size of everything in my shopping cart. My airline’s frequent flyer statement lists every city that I’ve flown to over the past year. Should I accidentally throw out the statement, all of this information is stored safely in numerous computer databanks.

Even my telephone calls are carefully recorded, tabulated, and presented to me at the end of each month. I remember in college when my girlfriend broke up with me during a long-distance phone call. We talked for 20 minutes, then she hung up. I called her back again and again; I got her answering machine each time. A few weeks later, the phone bill came in the mail, and there were the calls: one for 20 minutes, and then five calls in rapid succession, each one lasting just 15 seconds.

But by far the most detailed records of my life reside on my computer’s hard drive: my stored email messages, going back to my freshman year in college. All told, there are more than 600 megabytes of information—roughly 315,000 pages of double-spaced text, or 40 pages of text for every day since September 3, 1983, when I got my first email account at MIT.

“Keep all your old email messages,” my friend Harold told me just before I graduated. “When historians look back at the 1980s, we are the ones they’re going to be writing about.” And he was right: with keyword searching and advanced text-processing algorithms, it will be a simple matter for some future historian to assemble a very accurate record of my life as a college student—and my life ever since—by examining the written electronic record I’ve left behind.

But this archive of facts and feelings is a rapier that can slice two different ways. More than my own digital diary, I have also been casting a vast “data shadow " that reveals the secrets of my daily life to anyone who can read it.

Alan Westin coined the term data shadow in the 1960s. Westin, a professor at Columbia University in New York, warned that credit records, bank records, insurance records, and other information that made up America’s emerging digital infrastructure could be combined to create a detailed digital dossier. The metaphor, with its slightly sinister feeling, was uncannily accurate: just as few people are aware of where their shadows fall, few data subjects in the future, Westin conjectured, would be able to keep track of their digital dossiers.

In the three decades that have passed since then, the data shadow has grown from an academic conjecture to a concrete reality that affects us all.

We stand at the brink of an information crisis. Never before has so much information about so many people been collected in so many different places. Never before has so much information been made so easily available to so many institutions in so many different ways and for so many different purposes.

Unlike the email that’s stored on my laptop, my data shadow is largely beyond my control. Scattered across the computers of a hundred different companies, my shadow stands at attention, shoulder-to-shoulder with an army of other data shadows inside the databanks of corporations and governments all over the world. These shadows are making routine the discovery of human secrets. They are forcing us to live up to a new standard of accountability. And because the information that makes up these shadows is occasionally incorrect, they leave us all vulnerable to punishment or retaliation for actions that we did not even commit.

The good news is that we can fight back against this wholesale invasion of personal privacy. We can fight to stop the capturing of everyday events. And where capture is inevitable, we can establish strong business practices and laws that guarantee the sanctity of our privacy—protection for our shadows to live by. We have done so before. All that’s needed is for people to understand how this information is being recorded, and how to make that recording stop.

The Information Crisis

As an experiment, make a list of the data trails that you leave behind on a daily basis. Did you buy lunch with a credit card? Write that down. Did you buy lunch with cash, but visit the automatic teller machine (ATM) beforehand? If so, then that withdrawal makes up your data shadow as well. Every long distance phone call, any time you leave a message inside a voice mailbox, and every web page you access on the Internet—all of these are part of your comprehensive data profile.

You are more likely to leave records if you live in a city, if you pay for things with credit cards, and if your work requires that you use a telephone or a computer. You will leave fewer records if you live in the country or if you are not affluent. This is really no surprise: detailed records are what makes the modern economy possible.

What is surprising, though, is the amount of collateral information that these records reveal. Withdraw cash from an ATM, and a computer records not just how much money you took out, but the fact that you were physically located at a particular place and time. Make a telephone call to somebody who has Caller ID, and a little box records not just your phone number (and possibly your name), but also the exact time that you placed your call. Browse the Internet, and the web server on the other side of your computer’s screen doesn’t just record every page that you download—it also records the speed of your computer’s modem, the kind of web browser you are using, and even your geographical location.

There’s nothing terribly new here, either. In 1986, John Diebold wrote about a bank that seven years earlier

had recently installed an automatic teller machine network and noticed “that an unusual number of withdrawals were being made every night between midnight and 2:00 a.m.”...Suspecting foul play, the bank hired detectives to look into the matter. It turns out that many of the late-night customers were withdrawing cash on their way to a local red light district!^[1]

An article about the incident that appeared in the Knight News Service observed: “there’s a bank someplace in America that knows which of its customers paid a hooker last night.”^[2] (Diebold, one of America’s computer pioneers in the 1960s and 1970s, had been an advocate of the proposed National Data Center. But by 1986, he had come to believe that building the Data Center would have been a tremendous mistake, because it would have concentrated too much information in one place.)

I call records such as banks’ ATM archives hot files . They are juicy, they reveal unexpected information, and they exist largely outside the scope of most people’s understanding.

Over the past 15 years, we’ve seen a growing use of hot files. One of the earliest cases that I remember occurred in the 1980s, when investigators for the U.S. Drug Enforcement Agency started scanning through the records of lawn-and-garden stores and correlating the information with data dumps from electric companies. The DEA project was called Operation Green Merchant; by 1993, the DEA, together with state and local authorities, had seized nearly 4,000 growing operations, arrested more than 1,500 violators, and frozen millions of dollars in illicitly acquired profits and assets.^[3] Critics charged that the program was a dragnet that caught both the innocent and the guilty. The investigators were searching out people who were clandestinely raising marijuana in their basements. While the agents did find some pot farmers, they also raided quite a few innocent gardeners—including one who lived next to an editor at the New York Times. The Times eventually wrote an editorial, but it didn’t stop the DEA’s practices.

Americans got another dose of hot file surprise in the fall of 1987, when President Ronald Reagan nominated Judge Robert Bork to the Supreme Court of the United States. Bork’s nomination was fiercely opposed by women’s groups, who said that the judge had a history of ruling against women’s issues; they feared that Bork would be the deciding vote to help the Court overturn a woman’s right to an abortion. Looking for dirt, a journalist from Washington, D.C.’s liberal City Paper visited a video rental store in Bork’s neighborhood and obtained a printout from the store’s computer of every movie that Bork had ever rented there. The journalist had hoped that Bork would be renting pornographic films. As it turned out, Bork’s tastes in video veered towards mild fare: the 146 videos listed on the printout were mostly Disney movies and Hitchcock films.

Nevertheless, Bork’s reputation was still somewhat damaged. Some accounts of the Bork story that have been published and many off-handed remarks at cocktail parties often omit the fact that the journalist came up empty in the search for pornography. Instead, these accounts erroneously give the impression that Bork was a fan of porn, or at least allow the reader to draw that conclusion.

The problem with hot files, then, is that they are too hot: on the one hand, they reveal information about us that many people think a dignified society keeps private; on the other hand, they are easily misinterpreted. And it turns out that these records are also easily faked: if the clerk at the video rental store had wanted to do so, that person could easily have added a few dozen porno flicks to the record, and nobody could have proved that the record had been faked.

As computerized record-keeping systems become more prevalent in our society, we are likely to see more and more cases in which the raw data collected by these systems for one purpose is used for another. Indeed, advancing technology makes such releases all the more likely. In the past, computer systems simply could not store all of the information that they could collect: it was necessary to design systems so that they would periodically discard data when it was no longer needed. But today, with the dramatic developments in data storage technology, it’s easy to store information for months or years after it is no longer needed. As a result, computers are now retaining an increasingly more complete record of our lives—as they did with Judge Bork’s video rental records. Ask yourself this: what business did the video rental store have keeping a list of the movies that Bork had rented, after the movies had been returned?

This sea of records is creating a new standard of accountability for our society. Instead of relying on trust or giving people the benefit of the doubt, we can now simply check the record and see who was right and who was wrong. The ready availability of personal information also makes things easier for crooks, stalkers, blackmail artists, con men, and others who are up to no good. One of the most dramatic cases was the murder of actress Rebecca Schaeffer in 1989. Schaeffer had gone to great lengths to protect her privacy. But a 19-year-old crazed fan, who allegedly wanted to meet her, hired a private investigator to find out her home address. The investigator went to California’s Department of Motor Vehicles, which at the time made vehicle registration information available to anyone who wanted it, since the information was part of the public record. The fan then went to Schaeffer’s house, waited for four hours, and shot her once in the chest when she opened her front door.^[4]

False Data Syndrome

Another insidious problem with this data sea is something I call false data syndrome . Because much of the information in the data sea is correct, we are predisposed to believe that it is all correct—a dangerous assumption that is all too easy to make. The purveyors of the information themselves often encourage this kind of sloppy thinking by failing to acknowledge the shortcomings of their systems.

For example, in 1997, the telephone company NYNEX (now part of Bell Atlantic) launched an aggressive campaign to sell the new Caller ID service to its subscribers. With the headline “See Who’s Calling Before You Pick Up the Phone,” the advertisement read:

Caller ID lets you see both the nam e and nu mber of the incoming call so you can decide to take the call now or return it later. Even if the caller doesn’t leave a message, your Caller ID box automatically stores the name, number, and time of the incoming call. Caller ID also works with Call Waiting, so you can see who’s calling even while you’re talking to someone else. ^[5]

Clearly, NYNEX was confusing human identities with telephone numbers. Caller ID doesn’t show the telephone number that belongs to the person who is making the call—it shows the number of the telephone from which the call is being placed. So-called “enhanced” Caller ID services that display a name and number don’t really display the caller’s name—they display the name of the person who is listed in the telephone book. If I make an obscene call from your house during a party, or if I use your telephone to make a threat on the life of the president of the United States (a federal crime), Caller ID will say that you are the culprit—not me.

The Tracking Process: How Our Information is turned Against Us

Nobody set out to build a society in which the most minute details of everyday life are permanently recorded for posterity. But this is the future that we are marching towards, thanks to a variety of social, economic, and technological factors.

Humans are born collectors. Psychologically, it’s much easier to hold on to something than to throw it away. This is all the more true for data. Nobody really feels comfortable erasing business correspondence or destroying old records—you never know when something might be useful. Advancing technology is making it possible to realize our collective dream of never throwing anything away—or at least never throwing away a piece of information.

The first computer that I bought in 1978 stored information on cassette tapes. I could fit 200 kilobytes on a 30-minute cassette, if I was lucky. The computer that I use today has an internal hard disk that can store 6 gigabytes of information—a 30,000-fold increase in just two decades. And this story is hardly unique: all over the world, businesses, governments, and individuals have seen similar improvements in their ability to store data. As a civilization, we’ve used this newfound ability to store more and more minute details of everyday existence. We are building the world’s datasphere: a body of information that describes the Earth and our actions upon it.

Building the world’s datasphere is a three-step process—one that we’ve been blindly following without considering its ramifications for the future of privacy. First, industrialized society creates new opportunities for data collection. Next, we dramatically increase the ease of automatically capturing information into a computer. The final step is to arrange this information into a large-scale database so it can be easily retrieved at a moment’s notice.

Once the day-to-day events of our lives are systematically captured in a machine-readable format, this information takes on a life of its own. It finds new uses. It becomes indispensable in business operations. And it often flows from computer to computer, from business to business, and between industry and government. If we don’t step back and stop the collection and release of this data, we’ll soon have a world in which every moment and every action is permanently “on the record.”

Step 1: Make Data Collectable

The first step to building the global datasphere is to create information worth collecting. Consider a forest: by itself, a mountain of trees has no data. Now go through the forest and number every tree, estimate its age and height, and survey its location, and you have created an extremely valuable data set for both environmentalists and the timber industry.

I got a very good introduction to this first step in 1988, when I visited a BASF floppy disk manufacturing plant located just outside Boston. As part of the manufacturing process, I learned, each floppy disk is stamped with a code called a lot number. Pick up a floppy disk and flip it over, and you’re likely to find these same numbers today: A2C5114B, or S2078274, or 01S1406. These codes identify the manufacturer of the disks, the factory, and the particular machine where the disk was created, and the date and time that the lot was started. Sometimes the information is encoded directly into the number. Other times, the lot number merely refers to an entry in a logbook or a process control system. Either way, decoding a lot number usually requires proprietary information that manufacturers are rarely willing to share with the general public.

The primary purpose of these lot numbers is quality control. If a run of bad disks turns up, the manufacturer can look at the lot number on the bad disks and figure out where they came from. By examining the factory’s records, a quality control engineer can find the exact piece of equipment that caused the problem—which is the first step to preventing the problem from recurring in the future. Ultimately, this saves the company money and improves its reputation.

Once you know how to recognize lot numbers, you’ll soon start seeing lot numbers everywhere: on a candy bar wrapper, a bottle of pills, or the rim of a flashlight. Some objects are so important that each one gets its own tracking number, in which case the number is called a serial number. Turn over the mouse that’s connected to your desktop computer and you’ll find one. There’s another serial number on your computer itself, as well as on many of the individual components inside. It’s all quite ironic. In the early days of the Industrial Revolution, one of the biggest technical challenges that engineers faced was producing functionally identical, interchangeable parts. Today, we have become so good at making things indistinguishable that we now need to imprint each with its own code so that we can tell them apart.

Lot numbers and serial numbers all serve a fundamental purpose: by making seemingly identical things distinguishable, the numbers make the history of the things recordable. But once inscribed, these codes can be used for much more than simple quality control: increasingly, lot numbers and serial numbers are being used for law enforcement.

Lot numbers can prove vital to a product-tampering investigation, for example. If tampered products at different stores all come from the same lot, then the tampering probably took place at the factory. If tampered products all come from different lots—possibly manufactured at different plants—then the tampering almost certainly took place at the store or in the home.

One of the most successful tracking numbers in recent years is the Vehicle Identification Number (VIN), a 17-character code that is stamped on the dashboard, engine, and axle of every car and truck manufactured in the world. The VIN was created in the 1970s by a coalition of auto makers and national governments who wanted a worldwide standard, according to Thomas Carr, manager of passenger safety regulations for the American Automobile Manufacturer’s Association.^[6] The first 16 characters in the VIN identify the manufacturer, the country of manufacture, the make and model of the vehicle, the assembly plant where it was built, the year it was built, information on the car’s restraint system, the kind of transmission and rear axle used in trucks, and a six-digit sequential code.

The last character in the VIN is special. It’s called a check digit. This digit doesn’t contain any information of its own; instead, it is computed from the other digits. The digit makes the VIN self-verifying, letting a computer automatically detect a number of common typographical errors, such as switching two digits around or hitting an adjacent key on a computer keyboard. Since the VIN is the key that is used to index all of the records for a particular motor vehicle, explains Carr, being able to verify that a VIN has been correctly typed is very important.

VINs are used to track the car throughout the entire production process. After the car leaves the plant, VINs are used by governments to keep track of who owns each car, both to collect taxes and to help return stolen cars to their rightful owners. And in recent years, VINs have found a new role—solving car and truck bomb cases. In the bombings of both the World Trade Center in New York City and the Murrah Federal Building in Oklahoma City, investigators were able to quickly locate the axles of the trucks that were blown up. The VINs that were stamped on the axles allowed investigators to determine the trucks’ owners, which allowed them to determine where the trucks had been rented—which in both cases led them to the identities of the bombers.

Step 2: Make Data machine-readable

Automatic data collection is the second big step needed to create the datasphere. Automated systems read a piece of information and feed it directly into a computer, without human intervention. Although automated systems can be expensive to set up, once they are operational, they dramatically lower the cost of data collection, making it possible to create huge data sets and to keep them up to date. As a result, when a few major players in an industry start to adopt an automated system, the entire industry quickly follows.

The U.S. banking industry was one of the first major segments of our economy to adopt machine-readable codes. In 1963, a few banks started printing checks using special magnetic ink, so that computers could automatically read the nine-digit bank routing numbers, account numbers, and check numbers stamped across each check’s bottom. It was a good idea: by 1969, 90% of the checks in the United States were printed with the shiny black numbers, greatly decreasing the time

A tracking number that has had a very rough start is the Processor Serial Number (PSN) that Intel introduced with its Pentium III microprocessor. Intel originally designed the serial number in the Pentium III microprocessor to help the company detect “over-clocking” of CPUs (i.e., when a 500 MHz chip is sold as a 600 MHz chip) and to help large companies track computers as they move around through an organization. When upper management found out about the feature, the PSN was given an “e-commerce” spin—Intel suggested that web sites could use special software to read the PSN of their customers’ computers over the Internet.

When Intel announced the PSN in January 1999, the company decided to emphasize the “e-commerce” feature, rather than the asset-tracking capability. Within a week, several consumer groups organized a boycott against the microprocessor, saying that the more likely use of the PSN would be to silently track Internet users as they click through web sites. Meanwhile, cryptography expert Bruce Schneier published a scathing article in which he attacked the PSN because it was a number that could not be obtained in a secure fashion. He wrote:

If a remote Web site queries a processor ID, it has no way of knowing whether the number it gets back is a real ID or a forged ID. Likewise, if a piece of software queries its processor’s ID, it has no way of knowing whether the number it gets back is the real ID or whether a patch in the operating system trapped the call and responded with a fake ID. Because Intel didn’t bother creating a secure way to query the ID, it will be easy to break the security.^[7]

required to process them.^[8] In the 1970s, the banking industry started adding magnetic strips to credit cards so the little pieces of plastic could be swiped through a reader. Before then, the numbers had to be manually entered into a computer after they were transferred to a credit card slip using a piece of carbon paper and a roller.

Other industries have been slower to adopt machine-readable systems. It wasn’t until the mid-1990s that General Motors started supplementing the original VIN plates with machine-readable bar codes. Unlike the old VIN, which could only be read up close by a human, the new bar code can be read from more than 20 feet away using a high-speed laser scanner. Once in place, the bar code VIN quickly gained adherents. One company that jumped on the bandwagon was the car rental agency Avis, which now uses laser scanners to automatically track cars as they are returned at the company’s drop-off locations. In the coming years, these machine-readable VINs will increasingly be a part of most drivers’ lives. For example, urban garages might use the bar codes to automatically open gates for their monthly patrons. Other

Figure 4-1. License Plate Reader

Electronic tags aren’t the only way to track a vehicle on the open road. The U.S. Customs Service has deployed license plate readers at many border crossings between the U.S. and Canada. These systems use a high-resolution video camera to locate and capture the image of a car’s license plate in just milliseconds. From that image, the Perceptics license plate reader can determine both the plate’s number and the issuing state or province. Says the company, “With our License Plate Reader, every highway is an open book.” [Photos courtesy Perceptics]

companies have developed computerized vision systems that can read the license plate of a stopped or moving car, creating another system for automatically identifying automobiles at a distance.

Moving away from magnetic and optical systems, the newest machine-readable tags are scanned using radio waves. The technology, called RFID (short for Radio Frequency Identification Device), consists of two parts: a tiny silicon chip with a small radio antenna, called the tag, and a gun-shaped reader. Each chip is manufactured with a unique code. Point the reader at the chip, and the chip’s code appears on the reader’s display. The code is also sent to an attached computer.

Figure 4-2. Radio Frequency Identification Devices

Radio Frequency Identification Device (RFID) systems make it possible to embed a computer-readable serial number in an automobile, a gas cylinder, a pet, or even a human being. The system is based on an electronic tag that is stimulated using a low-energy radio signal. Once energized, the tag transmits its serial number. RFID tags are made by many different manufacturers; some RFID tags can be read from a distance of several feet. RIFD systems have been used for ski tags, employee badges, and tracking animals. A similar technology is used in most highway automatic toll collection systems. Since these systems are silent and passive, they can be read without the knowledge (or the consent) of the person carrying the radio tag.

Like other identification systems, RFID systems don’t actually identify a car, a pet, or a person: they simply identify the tag. And since no cryptography is employed by today’s RFID systems, an RFID identification response can be eavesdropped, falsified, or otherwise forged. The tags can also be read without the owner’s knowledge. Since today’s tags have no memory, there is no way to determine how many times a tag has been read, only by whom. Neither the producers nor the users of these systems seem to be concerned with the shortcomings of the security that these systems provide. [Photos courtesy Trovan]

The chips have no moving parts, no batteries to wear out, and an indefinite lifetime.

When you point an RFID reader at a transponder and pull the trigger, the gun fires a burst of radio frequency energy in the direction of the chip. The transponder’s antenna picks up this energy and converts it into an electric current, which powers both the transponder’s microchip and its tiny on-board radio transmitter. The transponder then sends back the chip’s unique code (today’s chips use a 64-bit code) on another radio frequency.

Several companies make RFID systems. One of the largest is Trovan, based in the United Kingdom. Trovan’s largest device is about the size of a quarter and can be read from two feet away; the smallest is the size of a grain of rice. Readable from 18 inches, the tiny tag is designed to be sewn into the lining of clothing for inventory tracking. Trovan also makes a special implantable tag which comes in a presterilized, ready-to-use disposable syringe; it can be tucked under the skin of an animal in less than 20 seconds.^[9]

In England, Yamaha dealers are using Trovan to help fight motorcycle theft. For U.K.$65 (about U.S.$100), you can have Trovan chips implanted into your bike’s frame, wheels, tank, and seat. If the bike is stolen or stripped, the parts can be identified when somebody comes in trying to sell them.

In the United States, RFID systems are being used for asset management—a technique through which businesses cut costs by carefully managing the items they have already bought. One application is the tracking of gas cylinders. By drilling a small hole in the neck of the gas cylinder and dropping in an RFID device, it becomes possible to accurately track the location of each cylinder as it is moved between the plant and the customer. Other companies have embedded RFID devices in hand-held tools, which workers are then required to check out and check back in like library books.

Meanwhile, implantable tags are being used by zoos around the world to track exotic animals. And in North America, they’re being used to track pets: by the summer of 1997, at least 200,000 cats and dogs in the U.S. had been implanted with some form of RFID. Several companies now operate a national database that matches the pets’ chip ID numbers with their owners’ names, addresses, and phone numbers, alongside the chip’s identification code. Organizations like the ASPCA in New York City, San Diego County in California, and the cities of Minneapolis and St. Paul are buying readers. Stray animals found on the street are now being scanned when they are brought to a shelter.

As these cases show, the power of RFID is that once the radio tag is implanted into an object, the tag becomes a part of that object. A serial number that’s on a gun can be filed down or etched away with acid. Cars can be stripped of their VINs. Tattoos can be overgrown with hair, or simply covered by clothing. But put a chip on the inside and the serial number becomes invisible, indelible, and detectable at a distance.

Although the obvious motivation for tracking is to prevent loss, other advantages of increased control and knowledge soon come to light. Some U.S. farmers have discovered that once an animal is given a serial number, it becomes possible to keep highly accurate long-term records. By tracking an animal from birth to slaughter, keeping detailed records of each animal’s vaccination history, feed, weight, and handling, and even performing an occasional ultrasound scan, farmers can apply scientific management techniques to their overall operation. Ultimately, the extra work can increase the market value of an animal by approximately $700 to $1000. Meanwhile, the U.S. Department of Agriculture may soon mandate the electronic tracking of cattle in order to combat disease.^[10]

Step 3: Build a big Database

As the tag-wielding U.S. farmers have learned, a good database is what marks the difference between disorganized data and a usable collection of information. But the organization of a database, and the policies that control access to the information the database contains, can dramatically impact the privacy implications of the entire tracking enterprise.

Consider the case of Electronic Toll Collection (ETC) . Over the past decade, systems that let automobile and truck drivers pay their highway and bridge tolls electronically have been enthusiastically adopted around the world. The reason: ETC systems put an end to traffic jams around toll plazas. Instead of requiring drivers to stop and toss a few coins into a basket or hand a bill to a toll collector, most ETC systems use a radio tag to uniquely identify a car’s account, from which the toll is automatically deducted.

In Norway, Micro Design ASA installed one of the earliest systems on a highway north of Trondheim in 1988. The technology has improved rapidly since then. Today, a system manufactured by Saab Combitech, Sweden, can read an electronic tag in less than 10 milliseconds when the vehicle is traveling at speeds up to 100 miles per hour. The Saab system can also determine the vehicle’s speed by measuring the Doppler shift of the returning radio signal.

In 1994, the New York-area Triborough Bridge and Tunnel Authority (TBTA) installed an ETC system called E-ZPass at tollbooths on the Verrazano Narrows Bridge. After some early snafus, E-ZPass was soon fulfilling its mission, boosting the number of cars that each lane could handle from 250 to 1000 per hour. The public responded enthusiastically: during its first two years of operation, TBTA issued 550,000 E-ZPass tags. “Each work day, we collect 280,000 electronic tolls, or 42 percent of the total transactions,” TBTA president Michael Ascher told a trade publication in March 1997.^[11] A similar system, E-Pass, has been enthusiastically adopted by Florida drivers on the Orlando-Orange County Expressway.

Among state and federal highway administrators, the big issues with these ETC systems are cost, reliability, and interoperability. Many states have adopted systems that use incompatible tags: E-ZPass uses the windshield-mounted tag, while Florida’s E-Pass system uses a radio transponder the size of a flashlight mounted under the car’s front bumper. Within a few years, highway administrators hope the U.S. will adopt a single national system that will let a car travel from California to New York, paying all of the intervening bridge and highway tolls electronically.

But administrators have not focused on the privacy implications of the systems they are deploying. And those implications are staggering. The ETC systems maintain a detailed record of each time each car pays a toll. Officially, the ETC systems keep this information so they can send drivers a monthly statement showing them where their money is going. But the database is a gold mine of personal information that has uses far beyond simple accounting. A restaurant could scan it to build a list of everyone who drives by its place of business. A private investigator could use this database to track the movements of an errant spouse. Reporters could track celebrities, and crooks could use it to target a victim.

Once states are collecting large amounts of movement information, it is quite likely that it will be used and exploited. Already, cash-strapped state governments are selling their driver’s-license databases to companies like R. L. Polk, which are using the data to build marketing lists.^[12] But even if the information is not sold, its existence means that some bad guy might someday bribe a state employee to get at the juicy data.

Highway administrators don’t seem to be sensitive to these risks. In 1995, the Massachusetts Turnpike Authority (MTA) published a three-inch-thick Request for Proposals to contractors interested in selling electronic toll collection systems to the state. The word “privacy” didn’t appear. I called up John Judge, the MTA’s Director of Operations, to ask why.

“Privacy is a non-issue,” said Judge:

I think that is the experience nationwide, at least as it relates to electronic toll collection. Privacy has not been an issue that has emerged nationally. I think that [is] principally because it is a voluntary system. If you are of a mind where you might be concerned about privacy issues , you just don’t have to join the program, and can use the traditional toll collection methods. I don’t think that it is any more an issue than credit cards.^[13]

Distressingly, U.S. courts seem to agree with Judge—although for different reasons. On June 26, 1997, Justice Colleen McMahon ruled that the Triborough Bridge and Tunnel Authority had to turn over toll-crossing records to police whenever presented with a subpoena. Previously, the TBTA had required police to get a court order for release of the information—something that McMahon said was too restrictive on police. Her reasoning was that the movements of E-ZPass holders were easily observed, and so therefore the electronic records should be made public as well.^[14]

Positional information is also very much a part of the cellular telephone systems, which must track phones at all times so that calls can be delivered. In 1997, British Telecom announced that it was developing a mobile telephone that would report the caller’s location, to within 30 feet, to the person receiving the call. “Workers will no longer be able to phone the office pretending to be sick when they are at the beach, and movements of cheating spouses will be exposed,” enthused an article in the Electronic Telegraph.^[15] And as part of the U.S. 911 system, cellular providers must be able to locate 60% of all phones to within 150 meters by the year 2001. Like all positional information , this data has multiple uses. Besides allowing ambulances to be sent faster to a car wreck, police are increasingly asking cellular providers for position information when they serve wiretap orders on cell phone companies.

The approach to vehicular privacy has been similar across the border in Canada. Ontario’s Highway 407 now has a sophisticated system for automatically billing automobile owners for the number of miles their vehicles drive on the public highway. The system uses a video camera to capture the image of the vehicle’s license plate. Tolls are assessed when automobile registrations are renewed: people who refuse to pay the bills won’t be allowed to renew.

The Biggest Database In the World

Probably the largest database in the world today is the collection of web pages on the Internet. While much of the Web is filled with pornographic images, magazine articles, and product advertisements, there is a staggering amount of personal information as well: individual home pages, email messages, and postings to the Usenet. This record can be automatically searched for revealing disclosures, unintentional admissions of guilt, or other kinds of potentially valuable information.

Back before the explosive growth of the World Wide Web , Rick Gates, a student and lecturer at the University of Arizona, was interested in exploring the limits of the Internet database. In September 1992, he created the Internet Hunt, a monthly scavenger hunt for information on the Net. Early hunts had the participants locate satellite weather photographs or the text to White House speeches. The hunt was especially popular among librarians, who were at the time trying to make the case that the Internet could be a valuable reference tool.

Figure 4-3. Electronic Toll Collection

This statement from the Orlando-Orange County Expressway Authority shows the comings and goings of a car as it travels along the state’s expressway system. The cars are tracked using a passive electronic tag that is placed on the windshield or under the car’s frame. Although the E-Pass is designed for automatic toll collection, the system can also be used to precisely calculate the speed of automobiles, track cars that are stolen, or even snoop on errant spouses. In the future, these records could be used for marketing as well. Automatic toll collection systems create a goldmine of private information. Nevertheless, there have been few public discussions on the appropriate uses of this data. [Statement courtesy Orlando-Orange County Expressway Authority]

In June 1993, Gates decided to have a different kind of hunt. It was the first where the goal was simply to find as much information as possible about the person behind an email address.

In one week the hunt’s 32 teams eventually discovered 148 different pieces of information about the life of Ross Stapleton.^[16] A computer at the University of Michigan reported that Stapleton had B.A. degrees in Russian Language and Literature and Computer Science. A computer at the University of Arizona reported that he had a Ph.D. in Management Information Systems. A computer operated by the U.S. Military’s Defense Data Network (DDN) Network Information Center divulged Stapleton’s current and previous addresses and phone numbers. And a brochure on a Gopher server operated by the Computer Professionals for Social Responsibility reported that Stapleton was one of the conference’s speakers—and that he was an analyst in the Office of Scientific and Weapons Research at the U.S. Central Intelligence Agency.

But the most revealing information the group assembled came from statements Stapleton himself had made. By scanning messages he had sent to the COM-PRIV mailing list—ironically, a mailing list devoted to privacy issues—the group learned that Stapleton used the OS/2 operating system and didn’t have a fax machine. They learned that he was also affiliated with Georgetown University, where he was an adjunct professor and taught courses on the Information Age. They discovered that Stapleton subscribed to the Arlington Journal, the Chronicle of Higher Education, and Prodigy. He was a member of the AAASS (American Association for the Advancement of Slavic Studies). His Cleveland Freenet Membership number was #ak287.

From the dedication in Stapleton’s thesis dissertation, “Personal Computing in the CEMA Community,” the hunters discovered that Stapleton’s parents were named Tom and Shirle. From the heading of another mail message he sent, they discovered that he was engaged, and that his fiancée’s name was Sarah Gray. Transcripts of Stapleton’s comments at the Second Conference on Computers, Freedom, and Privacy were also unearthed.^[17]

“Stepping back a bit and taking the hunt results as a whole, one can see that there’s an awful lot of information that can be found on someone, even when restricted to freely accessible, publicly available Nets,” said organizer Rick Gates in his report on the hunt. “I hope that people keep that in mind when they are posting to an email listserv or newsgroup. They are really adding to the sum total of the Nets, and what they have to say in some limited discussion of an [obscure] topic may be around for a long time.”

An odd side effect of the global database is that it is easier to seek out information on people who have unique or unusual names. For instance, I tried searching the Internet in February 1998 for the phrase “Tom and Shirle.” HotBot, an Internet search engine, found the word “Tom” on 1,833,334 pages and the word “And” on 63,502,825 pages. But the word “Shirle” was on just 333 pages, and the phrase “Tom and Shirle” was on six pages—all of which, it turns out, were copies of Gates’s June 1993 report.

“I was pleasantly surprised to see the amount of information that I myself put out that they managed to find,” said Stapleton when I interviewed him for this chapter. “Nothing came out during the hunt that I would have said alarmed me.” But Stapleton had been worried that somebody at the CIA might be angry that he had revealed his name and employer in so many public forums. “It was only going to be a matter of time before somebody at work said, ‘Hey, what have you been doing?’”

Perhaps what’s most remarkable about the June 1993 Internet Hunt is that it no longer seems remarkable that such a detailed profile of a person could be constructed from publicly available sources. The explosion of online information sources, combined with advertiser-supported search-and-retrieval services like Yahoo, Lycos, and AltaVista, have made it possible to easily assemble these kinds of detailed profiles. Indeed, several services, such as DejaNews and HotBot, specifically advertise this ability.

The Age of Public Statements

Posts to email forums, Usenet groups, and online chat services are all different kinds of public statements . Most people who decide to take their place in cyberspace eventually start making these statements. And these statements are not like any others ever uttered in the course of human history. In the past, statements made in public were frequently lost. Yes, they could be recorded, but those records were almost always hard to retrieve, or even inaccessible. An angry farmer might speak up at a town meeting and have his name recorded in the minutes, but ten years later, somebody trying to do a background investigation on that farmer would be unlikely to find his remarks—especially if the farmer had moved to Seattle and started a new life as a programmer at Microsoft. Letters written to newspapers in the 1950s, 1960s, 1970s, and 1980s were certainly published for everybody to see, but they were rarely indexed in computerized databanks and made instantly available anywhere in the world.

This new generation of public statements is quantitatively different from anything that has ever come before. These are public statements that can be instantly searched out by a prospective employer, by a person with whom you have just had your first date, or by a coworker who means you harm. And once you’ve made a statement, it is out of your control: retraction has become an impossibility.

It is this search capability that is creating a new kind of absolute accountability. It’s a simple matter to use the Internet’s searching capabilities to get a list of people who have admitted to taking LSD, or who have used racist slurs in print, or who have a history of organizing for labor unions. Says Stapleton, “It’s increasingly easy for someone in an HR department to say—'Look, Joe here says that skydiving is cool. Do we want to carry him on the rolls considering that he might die? Jane here is in a lifestyle that the chairman might not find attractive. We might not want to put her forward for the public affairs spot.’ I don’t have any public activities that I don’t want to post about. If I did, I would be very cautious.”

Ultimately, the wide availability of this information might create powerful new social filters through which only the boring and reserved will be able to pass. The existence of this information makes opinionated people vulnerable to all sorts of malicious attacks. Pervasive recording and indexing of public statements might keep the best and the brightest from ever holding elected office.

The end of the 1993 Internet Hunt report contains this prescient note: “In short, we’re dealing with a unique medium here. It sort of feels like verbal discussion, but it’s a lot more enduring, and can reach millions of people.”

Ironically, Gates’ report endures to this day, and will probably endure for decades more. That’s because digitized text is very portable, very compact, and very easy to search. Although the original computer on which he typed and posted his message has long since been retired, the data has been copied again and again and again.

On May 12, 1999, the Boston Herald ran a front-page story titled "http://Waste.com.”^[18] The story detailed the results of an in-depth investigation the Herald had conducted of Internet use by public employees and others using taxpayer-funded Internet accounts. They discovered that an account registered to the state auditor’s office was being used to scalp tickets to a sporting event—a violation of state law. It found that an account belonging to MassEd.Net, a taxpayer-funded organization that subsidizes Internet access for teachers and schools, was being used “to promote a sex-and-wrestling Web site.” It found that an account registered to the Department of Public Works “was used to buy and sell erotic Japanese cartoons, including a cartoon series called ‘Rapeman’ that glorifies rape.” It noted that an Internet user at the Secretary of State’s office had sent 324 messages about TV shows, including The Simpsons. And it found students using their high school Internet accounts to trade advice on making and buying LSD and other hallucinogens.

The source material for the news story came almost entirely from searches on the Internet search engine http://Deja.com, which archives postings to the Internet’s Usenet bulletin board system. Although Usenet messages can be easily forged, this possibility was never discussed in the Herald story.

The special report generated immediate response from state officials, who promised that they would enforce their existing policies on Internet use and put in place new ones to prohibit inappropriate uses of computer systems. It was a stunning testimony to the power of the Internet archives to hold people accountable for what they do with their computers.

Smart Machines Create Active Databanks

On April 14, 1999, computer maker Hewlett-Packard ran a three-page advertisement in the Wall Street Journal. The first two pages were a massive black-and-white spread showing a rather well-kept garage with a big empty space in the middle. A car has recently been removed. The text reads:

Your daughter inherited it from you. The lead foot, that is. And you left your vintage Jaguar in the garage. You think. Only you’re out of town, so you’re not sure. Enter e-services. E-what? A security chip in the car recognizes your daughter’s key and engages a “soft limit” that won’t allow the car to exceed 65 mph. Which, of course, she attempts to do. Instantly, the car sends a signal to a service you subscribe to, alerting you to what’s going on. Three thousand miles away, you excuse yourself from the dinner table and as you walk towards the lobby you push your speed dial. Your daughter is no more than three blocks from the driveway when the car phone begins ringing. How’s that again? Businesses and services are using the Internet in ways that go far beyond today’s websites. They’re adding a whole new dimension to the term “service.” The next chapter of the Internet is about to be written. And it has nothing to do with you working the Web. Instead, the Internet will work for you. http://www.hp.com/e-services.
The next E. E-services. Hewlett-Packard.

Hewlett-Packard’s vision of an active world begins to hint at the not-so-benevolent future that could await us. Why does the HP chip in the Jaguar block the daughter’s attempt to speed, but not her parents'? Why does the parent get the phone call from the car, and not the local police? Why isn’t the insurance company notified about the unsafe driver? Why doesn’t the car’s dealer get a report of the speeding and use it to invalidate the warranty on the car’s transmission? Perhaps the next chapter of the Internet will allow automobiles to automatically deduct the cost of a speeding ticket directly from your bank account, without the added cost to society of having a police officer chase you down.

Why should you, the data subject, control the data shadow of everything you do today?

Turning Back the Information tide

Faster machines, bigger hard disks, and intelligent database systems are all ultimately big threats to privacy. While the ability of computers to store information is increasing at something between 60% and 70% per year, the world’s population is only increasing at 1.6%. All things being equal, over time, an increasing percentage of our daily activities will be captured by the world’s datasphere.

So what’s the answer? Are we facing a future in which all of our lives need to be read like an open book, in which all of our secrets are kept inside glass file cabinets? Will we be increasingly monitored by our neighbors, our family, and even our machines, until we are all living inside a transparent society? Perhaps. But we do have a choice. We cannot turn back the clock, but we can build a world in which sensitive data is respected and kept private.

Take the case of Judge Bork. The journalist who pulled Bork’s video rental records triggered a series of hearings on Capitol Hill. Cynics said that the senators and congressmen were worried that their own video records might suddenly become fair game—and that the legislators, unlike Bork, had something to hide. But whatever their reason, the hearings revealed that the Bork incident was far from isolated. “Various examples of demands for video transactional records were mentioned [in the hearings], including an attempt to use video tape records to show that a spouse was an unfit parent, and a defendant in a child molestation case who wanted to show that the child’s accusations were based on movies viewed at home,” reported the Department of Commerce.^[19]

Those hearings weren’t idle chat. Before the end of that legislative session, Congress passed and President Bush signed the Video Privacy Protection Act of 1988 (18 USC 2710). Under the law, “A video tape service provider who knowingly discloses, to any person, personally identifiable information concerning any consumer” who rents or purchases a videotape is liable for civil action consisting of statutory actual damages of $2,500, punitive damages, reasonable attorney’s fees, and any other relief that the court may deem appropriate. By forbidding your local video store from giving out the titles of the movies you rent (without a court order, that is), the act took video rental records off the table. And by defining statutory damages, Congress eliminated a problem that plagues many privacy suits: the need to prove real damages. Furthermore, by allowing an aggrieved individual to sue for reasonable attorney’s fees and other litigation costs, Congress assured that lawyers would be willing to take such cases on a contingency basis.

In many ways, the 1988 law didn’t go far enough—it permits video stores to maintain rental records after tapes are returned, rather than requiring that the records be destroyed. The law also allows video rental companies to distill individual rental records into aggregate information, which could then be used as the basis of privacy violations. Nevertheless, the Video Privacy Protection Act has been stunningly effective. Violations of the law are extremely rare. Americans know that they can rent whatever videos they wish and not be forced to answer to anybody.

The Video Privacy Protection Act proves what many privacy advocates have been saying since the 1960s: the free market and voluntary privacy standards are frequently not sufficient to protect consumer privacy. An editorial that appeared in USA Today put it this way: “While voluntary compliance might be preferable in an ideal world, it’s not likely to work in the real world. The reality is that the absence of government prodding has resulted in too many companies doing too little to protect consumers’ privacy rights.”^[20]

Many businesses collect large amounts of personal information in the course of day-to-day operations. But just because the data has been collected, it doesn’t follow that the business has the right to make it publicly available, sell it on the open market, or use it for marketing. Data can be taken off the table. Strong privacy laws give businesses the incentive to do so.

An equally valid way to protect privacy is to prevent the accumulation of personal information in the first place. For example, instead of building an Electronic Toll Collection system that keeps account balances and toll-crossing information in a central database, it’s possible to build anonymous toll-collection systems. These systems are based on smart cards and use a form of digital cash for the toll payments. The smart card in these systems can be programmed to keep a record of each toll crossing, for the driver’s own use, or they can be programmed to throw this information away. Distributed smart card systems can be cheaper to build and operate than those based on massive centralized computers. Unfortunately, they are less popular—apparently because the technology is more difficult to explain to decision makers.

Overall, an informed and organized citizenry rarely fails to push through strong privacy measures. Consider Hong Kong: in the mid- 1980s, Hong Kong’s colonial government built a sophisticated system for electronic road pricing. Shortly after the system was deployed, drivers began receiving statements showing where and when they had traveled—and they became alarmed. Fearing that the system could be used to track people for political purposes, especially after the 1997 handover of Hong Kong to the Chinese mainland, the citizens succeeded in having the system shut down.^[21]

Failing responsible decision makers, there is always direct action. When people discover that their information is being used against them, they rebel — either by intentionally withholding their information, or by explicitly planting false data into the system. For example, many Internet users have responded to the problem of unsolicited junk email, also known as spam, by using mangled email addresses on their web pages and in their news postings. More people are using fake or intentionally misspelled names when subscribing to magazines. And many people use cash, rather than credit cards, even when it is inconvenient to do so. If these measures are not sufficient, even more aggressive techniques are likely to follow.

^[1]James Finn and Leonard R. Sussman, eds., Today’s American: How Free? (New York: Freedom House, 1986), p. 111.

^[2]Ibid.

^[3]U.S. Department of Justice Drug Enforcement Administration, “U.S. Drug Threat Assessment: 1993. Drug Intelligence Report. Availability, Price, Purity, Use, and Trafficking of Drugs in the United States,” September 1993, DEA-93042. Available online at http://mir.drugtext.org/druglibrary/schaffer/GOVPUBS/usdta.htm.

^[4]“TV-Movie Actress Slain in Apartment,” Associated Press, July 19, 1989. “Arizona Holds Man in Killing of Actress,” Associated Press, July 20, 1989. “Suspect in Slaying Paid to Find Actress,” Associated Press, July 23, 1989.

^[5]NYNEX advertisement, mailed to customers in Spring 1997.

^[6]Interview by author, September 9, 1997.

^[7]Bruce Schneier, “Why Intel’s ID Tracker Won’t Work,” ZDNet News, January 26, 1999. Republished in RISKS Digest 20:19. Available online at http://catless.ncl.ac.uk/Risks/20.19.html#subj4.

^[8]Westin, Databanks in a Free Society, p. 93.

^[9]The companies offering competing systems are American Veterinary Identification Devices (AVID), which runs the PETtrac recovery network; HomeAgain, which resells the Destron chip; InfoPet Systems, which sells the Trovan system; and PetNet, which resells the Anitech chip. Over the past three years, veterinarians and pet enthusiasts have argued over which chip is better, which is cheaper, which is easier to read, and so forth. The companies have responded by trying to build readers that can read each other’s chips, giving away free readers to shelters (in hopes of stimulating chip sales), and generally snipping at each other’s heels. As industrial applications take off, they’re likely to leave pet-chipping far in the dust. Trovan, for instance, sells a ruggedized version of its ID 100 microtransponder called the ID 103. This transponder is specifically designed for industrial applications and the garment industry. It’s encapsulated with a double-thick glass wall so that it can survive rollers and garment presses. It can survive temperatures up to 180° C. And it can be inserted into plastic as it cools, making the identification tag a permanent part of the item.

^[10]Murphy, Kate, “Get Along Little Dogie #384-591E: Laptop Cowboys Riding Herd on the Electronic Frontier,” New York Times, Monday, July 21, 1997.

^[11]ITS America News, April 1997, pp. 6–8.

^[12]The 1997 Driver’s Privacy and Protection Act requires that states allow individuals to opt out of motor vehicle databases before data is made available to marketers.

^[13]Interview by author, June 27, 1997.

^[14]Police Commissioner v. Triborough Bridge and Tunnel Authority(Sup. Ct. NYC IA Part 50R, June 26), as reported in the Privacy Journal, October 1997.

^[15]Robert Uhlig, “Spy Phones Trace Cheating Husbands,” Electronic Telegraph, August 27, 1997. Available at http://www.telegraph.co.uk:80/et?ac=002093890554028&rtwo=r3bhbhhx&atmo=99999999&pg=/et/97/8/27/nbt27.html, as reported in the August 29, 1997 issue of RISKS Digest.

^[16]Interview by author, August 1997.

^[17]The Second Conference on Computers, Freedom, and Privacy, Washington, D.C., 1992. See http://www.cpsr.org/dox/conferences/cfp92/home.html.

^[18]Joseph Malia, "http://Waste.com: Public Employees Using Internet for Sex, Drugs, and Rock ‘n’ Roll,” Boston Herald, May 12, 1999, p. 1. Full text available online at http://www.bostonherald.com/bostonherald/lonw/emai05121999.htm and http://www.mapinc.org/drugnews/v99.n505.a11.html/lsd.

^[19]U. S. Department of Commerce, Privacy and the NII: Safeguarding Telecommunications-Related Personal Information, October 1995. Available at http://nsi.org/Library/Comm/privnii.html.

^[20]Editorial, USA Today, October 25, 1995.

^[21]The Diebold Institute for Public Policy Studies, Inc., Transportation Infostructures (Westport, CT: Praeger, 1995).

Get Database Nation now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Database Nation by Simson Garfinkel