Chapter 4. The Cornucopia of the Commons

Dan Bricklin, Cocreator of Visicalc

Let’s get to the bottom of the Napster phenomenon—why is this music trading service so popular? One could say, trivially, that Napster is successful because you can find what you want (a particular song) and get it easily. It’s also pretty obvious that songs are easy to find because so many of them are available through Napster. If Napster let me get only a few popular songs, once I downloaded those I’d lose interest fast.

But what’s the root cause? Why are so many songs available? Hint: It has nothing to do with peer-to-peer. Peer-to-peer is plumbing, and most people don’t care about plumbing. While the “look into other people’s computers and copy directly” method has some psychological benefit to people who understand what’s going on (as indicated by thinkers such as Tom Matrullo and Dave Winer), I think the peer-to-peer aspects actually get in the way of Napster.

Let’s be blunt: Napster would operate much better if, when you logged in, it uploaded all the songs from your disk that weren’t already in the Napster database. If the songs were copied to a master server, rather than just the names of the songs and who was currently logged in, the same songs would be available for download provided by the same people, but at all times (not just when the “owner” happened to be connected to the Internet), and probably through more reliable and higher-speed connections to the Internet. (Akamai provides the kind of redundancy and efficiency that Napster currently relies on its worldwide network of users to provide.) Napster could at least maintain the list of who has what songs better than they do now.

Napster doesn’t work this way partly because peer-to-peer may be more legal (or so they argue) and harder to litigate against. But other applications may not have Napster’s legal problems and would therefore benefit from more centralized servers. While I’m a strong proponent for peer-to-peer for some things, I don’t think architecture is the main issue driving new services.

The issue is whether you get what you want from the application: “Is the data I want in the database?” What’s interesting about Napster is where its data ultimately comes from—the users—not when or how it’s transferred. So in this chapter, I’m going to examine how a service can fill a database with lots of whatever people want.

Ways to fill shared databases

There are three common ways to fill a shared database: organized manual, organized mechanical, and volunteer manual.

The classic case of an organized manual database is the original Yahoo! directory. This database was filled by organizing an army of people to put in data manually. Another example is the old legal databases where armies of typists were paid to retype printed material into computers.

The original AltaVista is an example of an organized mechanical database. A program running on powerful computers followed links and domain names and spidered the Web, saving the information as it went. Many databases on the Web today are mechanically created by getting access to somebody else’s data, sometimes for a fee. Examples include databases of street maps and the status of airline flights. Some of those databases are by-products of automated processes.

Finally, Usenet newsgroups and threaded discussions like Slashdot are examples of volunteer databases, where interested individuals provide the data because they feel passionate enough about doing so. Amazon.com’s well-known reviews are created through a mixture of organized manual and volunteer manual techniques: the company recruits some reviews and readers spontaneously put up others.

CDDB: A case study in how to get a manually created database

The most interesting databases (for the purposes of this chapter) are the ones that involve manual creation. When we look closely at some of them, we find some very clever techniques for getting data that are very specific to the subjects they cover and the users they serve. Let’s focus on one service that employs a very unusual technique to aggregate its data: the CDDB service offered by Gracenote to organize information about music CDs (http://www.cddb.com).

The CDDB database has information that allows your computer to identify a particular music CD in the CD drive and list its album title and track titles. Their service is used by RealJukebox, MusicMatch, Winamp, and others. What’s interesting is how they accumulate this information that so many users rely on without even thinking about it.

Most CDs do not store title information. The only information on the CD, aside from the audio tracks themselves, is the number of tracks (songs) and the length of each one. This is the information your CD player displays. What CDDB does is let the software on your PC take that track information, send a CD signature to CDDB through Internet protocols (if you’re connected), and get back the titles.

CDDB works because songs are of relatively random length. The chances are good almost all albums are unique. To understand this point, figure there are about 10 songs on an album, and that they each run from about a minute and a half to about three and a half minutes in length. The times for each song therefore vary by 100 seconds. There are 100 × 100 × ... × 100 = 10010 = 1011 = 100 billion = an awful lot of possible combinations. So an album is identified by a signature that is a special arithmetic combination of the times of all the tracks.

You’d figure that CDDB just bought a standard database with all the times and titles. Well, there wasn’t one. What they did was accept postings over the Internet that contained track timing information and titles typed in by volunteers. Software for playing music CDs on personal computers was developed that let people type in that information if CDDB didn’t have it. As people noticed that their albums failed to come up with titles when they played them on their PCs, many cared enough to type in the information. They benefited personally from typing the information because they could then more easily make their own playlists, but in the process they happened also to update the shared database. The database could be built even if only one person was willing to do this for each album (even an obscure album).

If you loved your CD collection, you’d want all the albums represented—or at least some people did. Some people are the type who like to be organized and label everything. Not everybody needed to be this type, just enough people to fill the database. Also, the CDDB site needed this volunteer (user) labor only until the database got big enough that it was valuable enough for other companies to pay for access.

CDDB is not run on a peer-to-peer architecture. Their database is on dedicated servers that they control. Their web site says:

CDDB is now a totally secure and reliable service which is provided to users worldwide via a network of high availability, mirrored servers which each have multiple, high bandwidth connections to the Internet... boasting a database of nearly 620,000 album titles and over 7.5 million tracks.

So CDDB succeeded not through peer-to-peer networking—it succeeded by harnessing the energy of its users.

Napster: Harnessing the power of personal selfishness

Napster is a manually created database built on work by volunteers. It gets bigger when one of its users buys (or borrows) a copy of a CD, converts it to MP3, and stores it in his or her shared music directory. It can also be enlarged when somebody creates an MP3 of their own performance that they want to share. But Napster cleverly provides a short-circuit around the process of manually creating data: In both cases, storing the copy in the shared music directory can be a natural by-product of the user’s normal work with the songs. It can be done as part of downloading songs to a portable music player or burning a personal mix CD. Whenever the users are connected to the Internet and to the Napster server, songs in the shared directory are then available to the world.

Of course, the user may not be connected to the Napster server all the time, so the song is not fully available to all who want it (a perennial problem with peer-to-peer systems). However, Napster overcomes this problem too, by exploiting the everyday activities of its users. Whenever someone downloads a song using Napster and leaves the file in his or her shared music directory, that person is increasing the number of Napster users who have that song, increasing the chances you will find someone with the song logged in to Napster when you want your copy. So again, the value of the database increases through normal use. (The same kind of replication is achieved in a more formal way by Freenet through its unique protocol, but Napster gets the same effect more simply—its protocol is just the decision of a user to do a download.)

The genius of Napster is that increasing the value of the database by adding more information is a natural by-product of each person using the tool for his or her own benefit. No altruistic sharing motives need be present, especially since sharing is the default. It isn’t even like the old song about “leaving a cup with water by the pump to let the next person have something to prime it with.” (I’ll have to use Napster to find that song....) In other words, nobody has to think of being nice to the next guy or put in even a tiny bit of extra effort.

As Internet analyst Kevin Werbach wrote in Release 1.0, a monthly report on technology trends:

What made Napster a threat to the record labels was its remarkable growth. That growth resulted from two things: Napster’s user experience and its focus on music... What makes Napster different is that it’s drop-dead simple to use. Its interface isn’t pretty, but it achieves that magic resonance with user expectations that marks the most revolutionary software developments.

I would add that, in using that simple, desirable user interface, you also are adding to the value of the database without doing any extra work. I’d like to suggest that one can predict the success of a particular system for building a shared database by how much the database is aided through normal, selfish use.

The commons

We’ve heard plenty about the tragedy of the commons—in fact, it pops up in several other chapters of this book. In the 1968 essay that popularized the concept, “The Tragedy of the Commons,” Garrett Hardin wrote:

Therein is the tragedy. Each man is locked into a system that compels him to increase his herd without limit—in a world that is limited. Ruin is the destination toward which all men rush, each pursuing his own best interest in a society that believes in the freedom of the commons. Freedom in a commons brings ruin to all.

In the case of certain ingeniously planned services, we find a contrasting cornucopia of the commons: use brings overflowing abundance. Peer-to-peer architectures and technologies may have their benefits, but I think the historical lesson is clear: concentrate on what you can get from users, and use whatever protocol can maximize their voluntary contributions. That seems to be where the greatest promise lies for the new kinds of collaborative environments.

Get Peer-to-Peer now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.