O'Reilly logo

Database Nation by Simson Garfinkel

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

The Tracking Process: How Our Information is turned Against Us

Nobody set out to build a society in which the most minute details of everyday life are permanently recorded for posterity. But this is the future that we are marching towards, thanks to a variety of social, economic, and technological factors.

Humans are born collectors. Psychologically, it's much easier to hold on to something than to throw it away. This is all the more true for data. Nobody really feels comfortable erasing business correspondence or destroying old records—you never know when something might be useful. Advancing technology is making it possible to realize our collective dream of never throwing anything away—or at least never throwing away a piece of information.

The first computer that I bought in 1978 stored information on cassette tapes. I could fit 200 kilobytes on a 30-minute cassette, if I was lucky. The computer that I use today has an internal hard disk that can store 6 gigabytes of information—a 30,000-fold increase in just two decades. And this story is hardly unique: all over the world, businesses, governments, and individuals have seen similar improvements in their ability to store data. As a civilization, we've used this newfound ability to store more and more minute details of everyday existence. We are building the world's datasphere: a body of information that describes the Earth and our actions upon it.

Building the world's datasphere is a three-step process—one that we've been blindly following without considering its ramifications for the future of privacy. First, industrialized society creates new opportunities for data collection. Next, we dramatically increase the ease of automatically capturing information into a computer. The final step is to arrange this information into a large-scale database so it can be easily retrieved at a moment's notice.

Once the day-to-day events of our lives are systematically captured in a machine-readable format, this information takes on a life of its own. It finds new uses. It becomes indispensable in business operations. And it often flows from computer to computer, from business to business, and between industry and government. If we don't step back and stop the collection and release of this data, we'll soon have a world in which every moment and every action is permanently "on the record."

Step 1: Make Data Collectable

The first step to building the global datasphere is to create information worth collecting. Consider a forest: by itself, a mountain of trees has no data. Now go through the forest and number every tree, estimate its age and height, and survey its location, and you have created an extremely valuable data set for both environmentalists and the timber industry.

I got a very good introduction to this first step in 1988, when I visited a BASF floppy disk manufacturing plant located just outside Boston. As part of the manufacturing process, I learned, each floppy disk is stamped with a code called a lot number. Pick up a floppy disk and flip it over, and you're likely to find these same numbers today: A2C5114B, or S2078274, or 01S1406. These codes identify the manufacturer of the disks, the factory, and the particular machine where the disk was created, and the date and time that the lot was started. Sometimes the information is encoded directly into the number. Other times, the lot number merely refers to an entry in a logbook or a process control system. Either way, decoding a lot number usually requires proprietary information that manufacturers are rarely willing to share with the general public.

The primary purpose of these lot numbers is quality control. If a run of bad disks turns up, the manufacturer can look at the lot number on the bad disks and figure out where they came from. By examining the factory's records, a quality control engineer can find the exact piece of equipment that caused the problem—which is the first step to preventing the problem from recurring in the future. Ultimately, this saves the company money and improves its reputation.

Once you know how to recognize lot numbers, you'll soon start seeing lot numbers everywhere: on a candy bar wrapper, a bottle of pills, or the rim of a flashlight. Some objects are so important that each one gets its own tracking number, in which case the number is called a serial number. Turn over the mouse that's connected to your desktop computer and you'll find one. There's another serial number on your computer itself, as well as on many of the individual components inside. It's all quite ironic. In the early days of the Industrial Revolution, one of the biggest technical challenges that engineers faced was producing functionally identical, interchangeable parts. Today, we have become so good at making things indistinguishable that we now need to imprint each with its own code so that we can tell them apart.

Lot numbers and serial numbers all serve a fundamental purpose: by making seemingly identical things distinguishable, the numbers make the history of the things recordable. But once inscribed, these codes can be used for much more than simple quality control: increasingly, lot numbers and serial numbers are being used for law enforcement.

Lot numbers can prove vital to a product-tampering investigation, for example. If tampered products at different stores all come from the same lot, then the tampering probably took place at the factory. If tampered products all come from different lots—possibly manufactured at different plants—then the tampering almost certainly took place at the store or in the home.

One of the most successful tracking numbers in recent years is the Vehicle Identification Number (VIN), a 17-character code that is stamped on the dashboard, engine, and axle of every car and truck manufactured in the world. The VIN was created in the 1970s by a coalition of auto makers and national governments who wanted a worldwide standard, according to Thomas Carr, manager of passenger safety regulations for the American Automobile Manufacturer's Association.[6] The first 16 characters in the VIN identify the manufacturer, the country of manufacture, the make and model of the vehicle, the assembly plant where it was built, the year it was built, information on the car's restraint system, the kind of transmission and rear axle used in trucks, and a six-digit sequential code.

The last character in the VIN is special. It's called a check digit. This digit doesn't contain any information of its own; instead, it is computed from the other digits. The digit makes the VIN self-verifying, letting a computer automatically detect a number of common typographical errors, such as switching two digits around or hitting an adjacent key on a computer keyboard. Since the VIN is the key that is used to index all of the records for a particular motor vehicle, explains Carr, being able to verify that a VIN has been correctly typed is very important.

VINs are used to track the car throughout the entire production process. After the car leaves the plant, VINs are used by governments to keep track of who owns each car, both to collect taxes and to help return stolen cars to their rightful owners. And in recent years, VINs have found a new role—solving car and truck bomb cases. In the bombings of both the World Trade Center in New York City and the Murrah Federal Building in Oklahoma City, investigators were able to quickly locate the axles of the trucks that were blown up. The VINs that were stamped on the axles allowed investigators to determine the trucks' owners, which allowed them to determine where the trucks had been rented—which in both cases led them to the identities of the bombers.

Step 2: Make Data machine-readable

Automatic data collection is the second big step needed to create the datasphere. Automated systems read a piece of information and feed it directly into a computer, without human intervention. Although automated systems can be expensive to set up, once they are operational, they dramatically lower the cost of data collection, making it possible to create huge data sets and to keep them up to date. As a result, when a few major players in an industry start to adopt an automated system, the entire industry quickly follows.

The U.S. banking industry was one of the first major segments of our economy to adopt machine-readable codes. In 1963, a few banks started printing checks using special magnetic ink, so that computers could automatically read the nine-digit bank routing numbers, account numbers, and check numbers stamped across each check's bottom. It was a good idea: by 1969, 90% of the checks in the United States were printed with the shiny black numbers, greatly decreasing the time

required to process them.[8] In the 1970s, the banking industry started adding magnetic strips to credit cards so the little pieces of plastic could be swiped through a reader. Before then, the numbers had to be manually entered into a computer after they were transferred to a credit card slip using a piece of carbon paper and a roller.

Other industries have been slower to adopt machine-readable systems. It wasn't until the mid-1990s that General Motors started supplementing the original VIN plates with machine-readable bar codes. Unlike the old VIN, which could only be read up close by a human, the new bar code can be read from more than 20 feet away using a high-speed laser scanner. Once in place, the bar code VIN quickly gained adherents. One company that jumped on the bandwagon was the car rental agency Avis, which now uses laser scanners to automatically track cars as they are returned at the company's drop-off locations. In the coming years, these machine-readable VINs will increasingly be a part of most drivers' lives. For example, urban garages might use the bar codes to automatically open gates for their monthly patrons. Other

companies have developed computerized vision systems that can read the license plate of a stopped or moving car, creating another system for automatically identifying automobiles at a distance.

Moving away from magnetic and optical systems, the newest machine-readable tags are scanned using radio waves. The technology, called RFID (short for Radio Frequency Identification Device), consists of two parts: a tiny silicon chip with a small radio antenna, called the tag, and a gun-shaped reader. Each chip is manufactured with a unique code. Point the reader at the chip, and the chip's code appears on the reader's display. The code is also sent to an attached computer.

The chips have no moving parts, no batteries to wear out, and an indefinite lifetime.

When you point an RFID reader at a transponder and pull the trigger, the gun fires a burst of radio frequency energy in the direction of the chip. The transponder's antenna picks up this energy and converts it into an electric current, which powers both the transponder's microchip and its tiny on-board radio transmitter. The transponder then sends back the chip's unique code (today's chips use a 64-bit code) on another radio frequency.

Several companies make RFID systems. One of the largest is Trovan, based in the United Kingdom. Trovan's largest device is about the size of a quarter and can be read from two feet away; the smallest is the size of a grain of rice. Readable from 18 inches, the tiny tag is designed to be sewn into the lining of clothing for inventory tracking. Trovan also makes a special implantable tag which comes in a presterilized, ready-to-use disposable syringe; it can be tucked under the skin of an animal in less than 20 seconds.[9]

In England, Yamaha dealers are using Trovan to help fight motorcycle theft. For U.K.$65 (about U.S.$100), you can have Trovan chips implanted into your bike's frame, wheels, tank, and seat. If the bike is stolen or stripped, the parts can be identified when somebody comes in trying to sell them.

In the United States, RFID systems are being used for asset management—a technique through which businesses cut costs by carefully managing the items they have already bought. One application is the tracking of gas cylinders. By drilling a small hole in the neck of the gas cylinder and dropping in an RFID device, it becomes possible to accurately track the location of each cylinder as it is moved between the plant and the customer. Other companies have embedded RFID devices in hand-held tools, which workers are then required to check out and check back in like library books.

Meanwhile, implantable tags are being used by zoos around the world to track exotic animals. And in North America, they're being used to track pets: by the summer of 1997, at least 200,000 cats and dogs in the U.S. had been implanted with some form of RFID. Several companies now operate a national database that matches the pets' chip ID numbers with their owners' names, addresses, and phone numbers, alongside the chip's identification code. Organizations like the ASPCA in New York City, San Diego County in California, and the cities of Minneapolis and St. Paul are buying readers. Stray animals found on the street are now being scanned when they are brought to a shelter.

As these cases show, the power of RFID is that once the radio tag is implanted into an object, the tag becomes a part of that object. A serial number that's on a gun can be filed down or etched away with acid. Cars can be stripped of their VINs. Tattoos can be overgrown with hair, or simply covered by clothing. But put a chip on the inside and the serial number becomes invisible, indelible, and detectable at a distance.

Although the obvious motivation for tracking is to prevent loss, other advantages of increased control and knowledge soon come to light. Some U.S. farmers have discovered that once an animal is given a serial number, it becomes possible to keep highly accurate long-term records. By tracking an animal from birth to slaughter, keeping detailed records of each animal's vaccination history, feed, weight, and handling, and even performing an occasional ultrasound scan, farmers can apply scientific management techniques to their overall operation. Ultimately, the extra work can increase the market value of an animal by approximately $700 to $1000. Meanwhile, the U.S. Department of Agriculture may soon mandate the electronic tracking of cattle in order to combat disease.[10]

Step 3: Build a big Database

As the tag-wielding U.S. farmers have learned, a good database is what marks the difference between disorganized data and a usable collection of information. But the organization of a database, and the policies that control access to the information the database contains, can dramatically impact the privacy implications of the entire tracking enterprise.

Consider the case of Electronic Toll Collection (ETC) . Over the past decade, systems that let automobile and truck drivers pay their highway and bridge tolls electronically have been enthusiastically adopted around the world. The reason: ETC systems put an end to traffic jams around toll plazas. Instead of requiring drivers to stop and toss a few coins into a basket or hand a bill to a toll collector, most ETC systems use a radio tag to uniquely identify a car's account, from which the toll is automatically deducted.

In Norway, Micro Design ASA installed one of the earliest systems on a highway north of Trondheim in 1988. The technology has improved rapidly since then. Today, a system manufactured by Saab Combitech, Sweden, can read an electronic tag in less than 10 milliseconds when the vehicle is traveling at speeds up to 100 miles per hour. The Saab system can also determine the vehicle's speed by measuring the Doppler shift of the returning radio signal.

In 1994, the New York-area Triborough Bridge and Tunnel Authority (TBTA) installed an ETC system called E-ZPass at tollbooths on the Verrazano Narrows Bridge. After some early snafus, E-ZPass was soon fulfilling its mission, boosting the number of cars that each lane could handle from 250 to 1000 per hour. The public responded enthusiastically: during its first two years of operation, TBTA issued 550,000 E-ZPass tags. "Each work day, we collect 280,000 electronic tolls, or 42 percent of the total transactions," TBTA president Michael Ascher told a trade publication in March 1997.[11] A similar system, E-Pass, has been enthusiastically adopted by Florida drivers on the Orlando-Orange County Expressway.

Among state and federal highway administrators, the big issues with these ETC systems are cost, reliability, and interoperability. Many states have adopted systems that use incompatible tags: E-ZPass uses the windshield-mounted tag, while Florida's E-Pass system uses a radio transponder the size of a flashlight mounted under the car's front bumper. Within a few years, highway administrators hope the U.S. will adopt a single national system that will let a car travel from California to New York, paying all of the intervening bridge and highway tolls electronically.

But administrators have not focused on the privacy implications of the systems they are deploying. And those implications are staggering. The ETC systems maintain a detailed record of each time each car pays a toll. Officially, the ETC systems keep this information so they can send drivers a monthly statement showing them where their money is going. But the database is a gold mine of personal information that has uses far beyond simple accounting. A restaurant could scan it to build a list of everyone who drives by its place of business. A private investigator could use this database to track the movements of an errant spouse. Reporters could track celebrities, and crooks could use it to target a victim.

Once states are collecting large amounts of movement information, it is quite likely that it will be used and exploited. Already, cash-strapped state governments are selling their driver's-license databases to companies like R. L. Polk, which are using the data to build marketing lists.[12] But even if the information is not sold, its existence means that some bad guy might someday bribe a state employee to get at the juicy data.

Highway administrators don't seem to be sensitive to these risks. In 1995, the Massachusetts Turnpike Authority (MTA) published a three-inch-thick Request for Proposals to contractors interested in selling electronic toll collection systems to the state. The word "privacy" didn't appear. I called up John Judge, the MTA's Director of Operations, to ask why.

"Privacy is a non-issue," said Judge:

I think that is the experience nationwide, at least as it relates to electronic toll collection. Privacy has not been an issue that has emerged nationally. I think that [is] principally because it is a voluntary system. If you are of a mind where you might be concerned about privacy issues , you just don't have to join the program, and can use the traditional toll collection methods. I don't think that it is any more an issue than credit cards.[13]

Distressingly, U.S. courts seem to agree with Judge—although for different reasons. On June 26, 1997, Justice Colleen McMahon ruled that the Triborough Bridge and Tunnel Authority had to turn over toll-crossing records to police whenever presented with a subpoena. Previously, the TBTA had required police to get a court order for release of the information—something that McMahon said was too restrictive on police. Her reasoning was that the movements of E-ZPass holders were easily observed, and so therefore the electronic records should be made public as well.[14]

Positional information is also very much a part of the cellular telephone systems, which must track phones at all times so that calls can be delivered. In 1997, British Telecom announced that it was developing a mobile telephone that would report the caller's location, to within 30 feet, to the person receiving the call. "Workers will no longer be able to phone the office pretending to be sick when they are at the beach, and movements of cheating spouses will be exposed," enthused an article in the Electronic Telegraph.[15] And as part of the U.S. 911 system, cellular providers must be able to locate 60% of all phones to within 150 meters by the year 2001. Like all positional information , this data has multiple uses. Besides allowing ambulances to be sent faster to a car wreck, police are increasingly asking cellular providers for position information when they serve wiretap orders on cell phone companies.

The approach to vehicular privacy has been similar across the border in Canada. Ontario's Highway 407 now has a sophisticated system for automatically billing automobile owners for the number of miles their vehicles drive on the public highway. The system uses a video camera to capture the image of the vehicle's license plate. Tolls are assessed when automobile registrations are renewed: people who refuse to pay the bills won't be allowed to renew.



[6] Interview by author, September 9, 1997.

[7] Bruce Schneier, "Why Intel's ID Tracker Won't Work," ZDNet News, January 26, 1999. Republished in RISKS Digest 20:19. Available online at http://catless.ncl.ac.uk/Risks/20.19.html#subj4.

[8] Westin, Databanks in a Free Society, p. 93.

[9] The companies offering competing systems are American Veterinary Identification Devices (AVID), which runs the PETtrac recovery network; HomeAgain, which resells the Destron chip; InfoPet Systems, which sells the Trovan system; and PetNet, which resells the Anitech chip. Over the past three years, veterinarians and pet enthusiasts have argued over which chip is better, which is cheaper, which is easier to read, and so forth. The companies have responded by trying to build readers that can read each other's chips, giving away free readers to shelters (in hopes of stimulating chip sales), and generally snipping at each other's heels. As industrial applications take off, they're likely to leave pet-chipping far in the dust. Trovan, for instance, sells a ruggedized version of its ID 100 microtransponder called the ID 103. This transponder is specifically designed for industrial applications and the garment industry. It's encapsulated with a double-thick glass wall so that it can survive rollers and garment presses. It can survive temperatures up to 180° C. And it can be inserted into plastic as it cools, making the identification tag a permanent part of the item.

[10] Murphy, Kate, "Get Along Little Dogie #384-591E: Laptop Cowboys Riding Herd on the Electronic Frontier," New York Times, Monday, July 21, 1997.

[11] ITS America News, April 1997, pp. 6–8.

[12] The 1997 Driver's Privacy and Protection Act requires that states allow individuals to opt out of motor vehicle databases before data is made available to marketers.

[13] Interview by author, June 27, 1997.

[14] Police Commissioner v. Triborough Bridge and Tunnel Authority(Sup. Ct. NYC IA Part 50R, June 26), as reported in the Privacy Journal, October 1997.

[15] Robert Uhlig, "Spy Phones Trace Cheating Husbands," Electronic Telegraph, August 27, 1997. Available at http://www.telegraph.co.uk:80/et?ac=002093890554028&rtwo=r3bhbhhx&atmo=99999999&pg=/et/97/8/27/nbt27.html, as reported in the August 29, 1997 issue of RISKS Digest.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required