BUY THIS BOOK
Add to Cart

Print Book $29.95


Safari Books Online

What is this?

Add to UK Cart

Print Book £20.95

What is this?

Looking to Reprint this content?


Peer-to-Peer: Harnessing the Power of Disruptive T
Peer-to-Peer: Harnessing the Power of Disruptive T Harnessing the Power of Disruptive Technologies

By Nelson Minar, Marc Hedlund, Clay Shirky, Tim O'Reilly, Dan Bricklin, David Anderson, Jeremie Miller, Adam Langley, Gene Kan, Alan Brown, Marc Waldman, Lorrie Faith Cranor, Aviel Rubin, Roger Dingledine, Michael Freedman, David Molnar, Rael Dornfest, Dan Brickley, Theodore Hong, Richard Lethin, Jon Udell, Nimisha Asthagiri, Walter Tuvell, Brandon Wiley
Edited by  Andy Oram
Price: $29.95 USD
£20.95 GBP

Cover | Table of Contents


Table of Contents

Chapter 1: A Network of Peers: Peer-to-Peer Models Through the History of the Internet
Nelson Minar and Marc Hedlund, Popular Power
The Internet is a shared resource, a cooperative network built out of millions of hosts all over the world. Today there are more applications than ever that want to use the network, consume bandwidth, and send packets far and wide. Since 1994, the general public has been racing to join the community of computers on the Internet, placing strain on the most basic of resources: network bandwidth. And the increasing reliance on the Internet for critical applications has brought with it new security requirements, resulting in firewalls that strongly partition the Net into pieces. Through rain and snow and congested Network Access Providers (NAPs), the email goes through, and the system has scaled vastly beyond its original design.
In the year 2000, though, something has changed—or, perhaps, reverted. The network model that survived the enormous growth of the previous five years has been turned on its head. What was down has become up; what was passive is now active. Through the music-sharing application called Napster, and the larger movement dubbed "peer-to-peer," the millions of users connecting to the Internet have started using their ever more powerful home computers for more than just browsing the Web and trading email. Instead, machines in the home and on the desktop are connecting to each other directly, forming groups and collaborating to become user-created search engines, virtual supercomputers, and filesystems.
Not everyone thinks this is such a great idea. Some objections (dealt with elsewhere in this volume) cite legal or moral concerns. Other problems are technical. Many network providers, having set up their systems with the idea that users would spend most of their time downloading data from central servers, have economic objections to peer-to-peer models. Some have begun to cut off access to peer-to-peer services on the basis that they violate user agreements and consume too much bandwidth (for illicit purposes, at that). As reported by the online News.com site, a third of U.S. colleges surveyed have banned Napster because students using it have sometimes saturated campus networks.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
A revisionist history of peer-to-peer (1969-1995)
The Internet as originally conceived in the late 1960s was a peer-to-peer system. The goal of the original ARPANET was to share computing resources around the U.S. The challenge for this effort was to integrate different kinds of existing networks as well as future technologies with one common network architecture that would allow every host to be an equal player. The first few hosts on the ARPANET—UCLA, SRI, UCSB, and the University of Utah—were already independent computing sites with equal status. The ARPANET connected them together not in a master/slave or client/server relationship, but rather as equal computing peers.
The early Internet was also much more open and free than today's network. Firewalls were unknown until the late 1980s. Generally, any two machines on the Internet could send packets to each other. The Net was the playground of cooperative researchers who generally did not need protection from each other. The protocols and systems were obscure and specialized enough that security break-ins were rare and generally harmless. As we shall see later, the modern Internet is much more partitioned.
The early "killer apps" of the Internet, FTP and Telnet, were themselves client/server applications. A Telnet client logged into a compute server, and an FTP client sent and received files from a file server. But while a single application was client/server, the usage patterns as a whole were symmetric. Every host on the Net could FTP or Telnet to any other host, and in the early days of minicomputers and mainframes, the servers usually acted as clients as well.
This fundamental symmetry is what made the Internet so radical. In turn, it enabled a variety of more complex systems such as Usenet and DNS that used peer-to-peer communication patterns in an interesting fashion. In subsequent years, the Internet has become more and more restricted to client/server-type applications. But as peer-to-peer applications become common again, we believe the Internet must revert to its initial design.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The network model of the Internet explosion (1995-1999)
The explosion of the Internet in 1994 radically changed the shape of the Internet, turning it from a quiet geek utopia into a bustling mass medium. Millions of new people flocked to the Net. This wave represented a new kind of people—ordinary folks who were interested in the Internet as a way to send email, view web pages, and buy things, not computer scientists interested in the details of complex computer networks. The change of the Internet to a mass cultural phenomenon has had a far-reaching impact on the network architecture, an impact that directly affects our ability to create peer-to-peer applications in today's Internet. These changes are seen in the way we use the network, the breakdown of cooperation on the Net, the increasing deployment of firewalls on the Net, and the growth of asymmetric network links such as ADSL and cable modems.
The network model of user applications—not just their consumption of bandwidth, but also their methods of addressing and communicating with other machines—changed significantly with the rise of the commercial Internet and the advent of millions of home users in the 1990s. Modem connection protocols such as SLIP and PPP became more common, typical applications targeted slow-speed analog modems, and corporations began to manage their networks with firewalls and Network Address Translation (NAT). Many of these changes were built around the usage patterns common at the time, most of which involved downloading data, not publishing or uploading information.
The web browser, and many of the other applications that sprung up during the early commercialization of the Internet, were based around a simple client/server protocol: the client initiates a connection to a well-known server, downloads some data, and disconnects. When the user is finished with the data retrieved, the process is repeated. The model is simple and straightforward. It works for everything from browsing the Web to watching streaming video, and developers cram shopping carts, stock transactions, interactive games, and a host of other things into it. The machine running a web client doesn't need to have a permanent or well-known address. It doesn't need a continuous connection to the Internet. It doesn't need to accommodate multiple users. It just needs to know how to ask a question and listen for a response.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Observations on the current crop of peer-to-peer applications (2000)
While the new breed of peer-to-peer applications can take lessons from earlier models, these applications also introduce new characteristics or features that are novel. Peer-to-peer allows us to separate the concepts of authoring information and publishing that same information. Peer-to-peer allows for decentralized application design, something that is both an opportunity and a challenge. And peer-to-peer applications place unique strains on firewalls, something well demonstrated by the current trend to use the HTTP port for operations other than web transactions.
One of the promises of the Internet is that people are able to be their own publishers, for example, by using personal web sites to make their views and interests known. Self-publishing has certainly become more common with the commercialization of the Internet. More often, however, users spend most of their time reading (downloading) information and less time publishing, and as discussed previously, commercial providers of Internet access have structured their offering around this asymmetry.
The example of Napster creates an interesting middle ground between the ideal of "everyone publishes" and the seeming reality of "everyone consumes." Napster particularly (and famously) makes it very easy to publish data you did not author. In effect, your machine is being used as a repeater to retransmit data once it reaches you. A network designer, assuming that there are only so many authors in the world and therefore that asymmetric broadband is the perfect optimization, is confounded by this development. This is why many networks such as college campuses have banned Napster from use.
Napster changes the flow of data. The assumptions that servers would be owned by publishers and that publishers and authors would combine into a single network location have proven untrue. The same observation also applies to Gnutella, Freenet, and others. Users don't need to create content in order to want to publish it—in fact, the benefits of publication by the "reader" have been demonstrated by the scale some of these systems have been able to reach.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Peer-to-peer prescriptions (2001-?)
The story is clear: The Internet was designed with peer-to-peer applications in mind, but as it has grown the network has become more asymmetric. What can we do to permit new peer-to-peer applications to flourish while respecting the pressures that have shaped the Internet to date?
As we have seen, the explosion of the Internet into the consumer space brought with it changes that have made it difficult to do peer-to-peer networking. Firewalls make it hard to contact hosts; dynamic IP and NAT make it nearly impossible. Asymmetric bandwidth is holding users back from efficiently serving files on their systems. Current peer-to-peer applications generally would benefit from an Internet more like the original network, where these restrictions were not in place. How can we enable peer-to-peer applications to work better with the current technological situation?
Firewalls serve an important need: they allow administrators to express and enforce policies about the use of their networks. That need will not change with peer-to-peer applications. Neither application designers nor network security administrators are benefiting from the current state of affairs. The solution lies in making firewalls smarter so that peer-to-peer applications can cooperate with the firewall to allow traffic the administrator wants. Firewalls must become more sophisticated, allowing systems behind the firewall to ask permission to run a particular peer-to-peer application. Peer-to-peer designers must contribute to this design discussion, then enable their applications to use these mechanisms. There is a good start to this solution in the SOCKS protocol, but it needs to be expanded to be more flexible and more tied toward applications rather than simple port numbers.
The problems engendered by dynamic IP and NAT already have a technical solution: IPv6. This new version of IP, the next generation Internet protocol architecture, has a 128-bit address space—enough for every host on the Internet to have a permanent address. Eliminating address scarcity means that every host has a home and, in theory, can be reached. The main thing holding up the deployment of IPv6 is the complexity of the changeover. At this stage, it remains to be seen when or even if IPv6 will be commonly deployed, but without it peer-to-peer applications will continue to need to build alternate address spaces to work around the limitations set by NAT and dynamic IP.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Conclusions
The Internet started out as a fully symmetric, peer-to-peer network of cooperating users. As the Net has grown to accommodate the millions of people flocking online, technologies have been put in place that have split the Net up into a system with relatively few servers and many clients. At the same time, some of the basic expectations of cooperation are showing the risk of breaking down, threatening the structure of the Net.
These phenomena pose challenges and obstacles to peer-to-peer applications: both the network and the applications have to be designed together to work in tandem. Application authors must design robust applications that can function in the complex Internet environment, and network designers must build in capabilities to handle new peer-to-peer applications. Fortunately, many of these issues are familiar from the experience of the early Internet; the lessons learned there can be brought forward to design tomorrow's systems.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Listening to Napster
Clay Shirky, The Accelerator Group
Premature definition is a danger for any movement. Once a definitive label is applied to a new phenomenon, it invariably begins shaping—and possibly distorting—people's views. So it is with the present movement toward decentralized applications. After a year or so of attempting to describe the revolution in file sharing and related technologies, we have finally settled on peer-to-peer as a label for what's happening.
Somehow, though, this label hasn't clarified things. Instead, it's distracted us from the phenomena that first excited us. Taken literally, servers talking to one another are peer-to-peer. The game Doom is peer-to-peer. There are even people applying the label to email and telephones. Meanwhile, Napster, which jump-started the conversation, is not peer-to-peer in the strictest sense, because it uses a centralized server to store pointers and resolve addresses.
If we treat peer-to-peer as a literal definition of what's happening, we end up with a phrase that describes Doom but not Napster and suggests that Alexander Graham Bell is a peer-to-peer engineer but Shawn Fanning is not. Eliminating Napster from the canon now that we have a definition we can apply literally is like saying, "Sure, it may work in practice, but it will never fly in theory."
This literal approach to peer-to-peer is plainly not helping us understand what makes it important. Merely having computers act as peers on the Internet is hardly novel. From the early days of PDP-11s and Vaxes to the Sun SPARCs and Windows 2000 systems of today, computers on the Internet have been peering with each other. So peer-to-peer architecture itself can't be the explanation for the recent changes in Internet use.
What have changed are the nodes that make up these peer-to-peer systems—Internet-connected PCs, which formerly were relegated to being nothing but clients—and where these nodes are: at the edges of the Internet, cut off from the DNS (Domain Name System) because they have no fixed IP addresses.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Resource-centric addressing for unstable environments
Peer-to-peer is a class of applications that takes advantage of resources—storage, cycles, content, human presence—available at the edges of the Internet. Because accessing these decentralized resources means operating in an environment of unstable connectivity and unpredictable IP addresses, peer-to-peer nodes must operate outside the DNS and have significant or total autonomy from central servers.
That's it. That's what makes peer-to-peer distinctive.
Note that this isn't what makes peer-to-peer important. It's not the problem designers of peer-to-peer systems set out to solve, like aggregating CPU cycles, sharing files, or chatting. But it's a problem they all had to solve to get where they wanted to go.
What makes Napster and Popular Power and Freenet and AIMster and Groove similar is that they are all leveraging previously unused resources, by tolerating and even working with variable connectivity. This lets them make new, powerful use of the hundreds of millions of devices that have been connected to the edges of the Internet in the last few years.
One could argue that the need for peer-to-peer designers to solve connectivity problems is little more than an accident of history. But improving the way computers connect to one another was the rationale behind the 1984 design of the Internet Protocol (IP), and before that DNS, and before that the Transmission Control Protocol (TCP), and before that the Net itself. The Internet is made of such frozen accidents.
So if you're looking for a litmus test for peer-to-peer, this is it:
  1. Does it allow for variable connectivity and temporary network addresses?
  2. Does it give the nodes at the edges of the network significant autonomy?
If the answer to both of those questions is yes, the application is peer-to-peer. If the answer to either question is no, it's not peer-to-peer.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Follow the users
It seems obvious but bears repeating: Definitions are useful only as tools for sharpening one's perception of reality and improving one's ability to predict the future. Whatever one thinks of Napster's probable longevity, Napster is the killer app for this revolution.
If the Internet has taught technology watchers anything, it's that predictions of the future success of a particular software method or paradigm are of tenuous accuracy at best. Consider the history of "multimedia." If you had read almost any computer trade magazine or followed any technology analyst's predictions for the rise of multimedia in the early '90s, the future they predicted was one of top-down design, and this multimedia future was to be made up of professionally produced CD-ROMs and "walled garden" online services such as CompuServe and Delphi. And then the Web came along and let absolute amateurs build pages in HTML, a language that was laughably simple compared to the tools being developed for other multimedia services.
HTML's simplicity, which let amateurs create content for little cost and little invested time, turned out to be HTML's long suit. Between 1993 and 1995, HTML went from an unknown protocol to the preeminent tool for designing electronic interfaces, decisively displacing almost all challengers and upstaging CD-ROMs, as well as online services and a dozen expensive and abortive experiments with interactive TV—and it did this while having no coordinated authority, no central R&D effort, and no discernible financial incentive for the majority of its initial participants.
What caught the tech watchers in the industry by surprise was that HTML was made a success not by corporations but by users. The obvious limitations of the Web for professional designers blinded many to HTML's ability to allow average users to create multimedia content.
HTML spread because it allowed ordinary users to build their own web pages, without requiring that they be software developers or even particularly savvy software users. All the confident predictions about the CD-ROM-driven multimedia future turned out to be meaningless in the face of user preference. This in turn led to network effects on adoption: once a certain number of users had adopted it, there were more people committed to making the Web better than there were people committed to making CD-ROM authoring easier for amateurs.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Where's the content?
Napster's success in pursuing this strategy is difficult to overstate. At any given moment, Napster servers keep track of thousands of PCs holding millions of songs comprising several terabytes of data. This is a complete violation of the Web's data model, "Content at the Center," and Napster's success in violating it could be labeled "Content at the Edges."
The content-at-the-center model has one significant flaw: most Internet content is created on the PCs at the edges, but for it to become universally accessible, it must be pushed to the center, to always-on, always-up web servers. As anyone who has ever spent time trying to upload material to a web site knows, the Web has made downloading trivially easy, but uploading is still needlessly hard. Napster dispenses with uploading and leaves the files on the PCs, merely brokering requests from one PC to another—the MP3 files do not have to travel through any central Napster server. Instead of trying to store these files in a central database, Napster took advantage of the largest pool of latent storage space in the world—the disks of the Napster users. And thus, Napster became the prime example of a new principle for Internet applications: Peer-to-peer services come into being by leveraging the untapped power of the millions of PCs that have been connected to the Internet in the last five years.
Napster's popularity made it the proof-of-concept application for a new networking architecture based on the recognition that bandwidth to the desktop had become fast enough to allow PCs to serve data as well as request it, and that PCs are becoming powerful enough to fulfill this new role. Just as the application service provider (ASP) model is taking off, Napster's success represents the revenge of the PC. By removing the need to upload data (the single biggest bottleneck to the ASP model), Napster points the way to a reinvention of the desktop as the center of a user's data—only this time the user will no longer need physical access to the PC.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Nothing succeeds like address, or, DNS isn't the only game in town
The early peer-to-peer designers, realizing that interesting services could be run off of PCs if only they had real addresses, simply ignored DNS and replaced the machine-centric model with a protocol-centric one. Protocol-centric addressing creates a parallel namespace for each piece of software. AIM and Napster usernames are mapped to temporary IP addresses not by the Net's DNS servers, but by privately owned servers dedicated to each protocol: the AIM server matches AIM names to the users' current IP addresses, and so on.
In Napster's case, protocol-centric addressing turns Napster into merely a customized FTP for music files. The real action in new addressing schemes lies in software like AIM, where the address points to a person, not a machine. When you log into AIM, the address points to you, no matter what machine you're sitting at, and no matter what IP address is presently assigned to that machine. This completely decouples what humans care about—Can I find my friends and talk with them online?—from how the machines go about it—Route packet A to IP address X.
This is analogous to the change in telephony brought about by mobile phones. In the same way that a phone number is no longer tied to a particular physical location but is dynamically mapped to the location of the phone's owner, an AIM address is mapped to you, not to a machine, no matter where you are.
This does not mean that DNS is going away, any more than landlines went away with the invention of mobile telephony. It does mean that DNS is no longer the only game in town. The rush is now on, with instant messaging protocols, single sign-on and wallet applications, and the explosion in peer-to-peer businesses, to create and manage protocol-centric addresses that can be instantly updated.
Nor is this change in the direction of easier peer-to-peer addressing entirely to the good. While it is always refreshing to see people innovate their way around a bottleneck, sometimes bottlenecks are valuable. While AIM and Napster came to their addressing schemes honestly, any number of people have noticed how valuable it is to own a namespace, and many business plans making the rounds are just me-too copies of Napster or AIM. Eventually, the already growing list of kinds of addresses—phone, fax, email, URL, AIM,
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
An economic rather than legal challenge
Much has been made of the use of Napster for what the music industry would like to define as "piracy." Even though the dictionary definition of piracy is quite broad, this is something of a misnomer, because pirates are ordinarily in business to sell what they copy. Not only do Napster users not profit from making copies available, but Napster works precisely because the copies are free. (Its recent business decision to charge a monthly fee for access doesn't translate into profits for the putative "pirates" at the edges.)
What Napster does is more than just evade the law, it also upends the economics of the music industry. By extension, peer-to-peer systems are changing the economics of storing and transmitting intellectual property in general.
The resources Napster is brokering between users have one of two characteristics: they are either replicable or replenishable.
Replicable resources include the MP3 files themselves. "Taking" an MP3 from another user involves no loss (if I "take" an MP3 from you, it is not removed from your hard drive)—better yet, it actually adds resources to the Napster universe by allowing me to host an alternate copy. Even if I am a freeloader and don't let anyone else copy the MP3 from me, my act of taking an MP3 has still not caused any net loss of MP3s.
Other important resources, such as bandwidth and CPU cycles (as in the case of systems like SETI@home), are not replicable, but they are replenishable. The resources can be neither depleted nor conserved. Bandwidth and CPU cycles expire if they are not used, but they are immediately replenished. Thus they cannot be conserved in the present and saved for the future, but they can't be "used up" in any long-term sense either.
Because of these two economic characteristics, the exploitation of otherwise unused bandwidth to copy MP3s across the network means that additional music can be created at almost zero marginal cost to the user. It employs resources—storage, cycles, bandwidth—that the users have already paid for but are not fully using.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Peer-to-peer architecture and second-class status
With this change in addressing schemes and the renewed importance of the PC chassis, peer-to-peer is not merely erasing the distinction between client and server. It's erasing the distinction between consumer and provider as well. You can see the threat to the established order in a recent legal action: a San Diego cable ISP, Cox@Home, ordered several hundred customers to stop running Napster not because they were violating copyright laws, but because Napster leads Cox subscribers to use too much of its cable network bandwidth.
Cox built its service on the current web architecture, where producers serve content from always-connected servers at the Internet's center and consumers consume from intermittently connected client PCs at the edges. Napster, on the other hand, inaugurated a model where PCs are always on and always connected, where content is increasingly stored and served from the edges of the network, and where the distinction between client and server is erased. Cox v. Napster isn't just a legal fight; it's a fight between a vision of helpless, passive consumers and a vision where people at the network's edges can both consume and produce.
The question of the day is, "Can Cox (or any media business) force its users to retain their second-class status as mere consumers of information?" To judge by Napster's growth, the answer is "No."
The split between consumers and providers of information has its roots in the Internet's addressing scheme. Cox assumed that the model ushered in by the Web—in which users never have a fixed IP address, so they can consume data stored elsewhere but never provide anything from their own PCs—was a permanent feature of the landscape. This division wasn't part of the Internet's original architecture, and the proposed fix (the next generation of IP, called IPv6) has been coming Real Soon Now for a long time. In the meantime, services like Cox have been built with the expectation that this consumer/provider split would remain in effect for the foreseeable future.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Remaking the Peer-to-Peer Meme
Tim O'Reilly, O'Reilly & Associates
On September 18, 2000, I organized a so-called "peer-to-peer summit" to explore the bounds of peer-to-peer networking. In my invitation to the attendees, I set out three goals:
  1. To make a statement, by their very coming together, about the nature of peer-to-peer and what kinds of technologies people should think of when they hear the term.
  2. To make some introductions among people whom I like and respect and who are working on different aspects of what could be seen as the same problem—peer-to-peer solutions to big problems—in order to create some additional connections between technical communities that ought to be talking to and learning from each other.
  3. To do some brainstorming about the issues each of us are uncovering, so we can keep projects from reinventing the wheel and foster cooperation to accelerate mutual growth.
In organizing the summit, I was thinking of the free software (open source) summit I held a few years back. Like free software at that time, peer-to-peer currently has image problems and a difficulty developing synergy. The people I was talking to all knew that peer-to-peer is more than just swapping music files, but the wider world was still focusing largely on the threats to copyright. Even people working in the field of peer-to-peer have trouble seeing how far its innovations can extend; it would benefit them to learn how many different types of technologies share the same potential and the same problems.
This is exactly what we did with the open source summit. By bringing together people from a whole lot of projects, we were able to get the world to recognize that free software was more than GNU and Linux; we introduced a lot of people, many of whom, remarkably, had never met; we talked shop; and ultimately, we crafted a new "meme" that completely reshaped the way people thought about the space.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
From business models to meme maps
Recently, I started working with Dan and Meredith Beam of Beam, Inc., a strategy consulting firm. Dan and Meredith help companies build their "business models"—one page pictures that describe "how all the elements of a business work together to build marketplace advantage and company value." It's easy to conclude that two companies selling similar products and services are in the same business, but the Beams think otherwise.
For example, O'Reilly and IDG compete in the computer book publishing business, but we have completely different business models. Their strategic positioning is to appeal to the "dummy" who needs to learn about computers but doesn't really want to. Ours is to appeal to the people who love computers and want to go as deep as possible. Their marketing strategy is to build a widely recognized consumer brand, and then dominate retail outlets and "big box" stores in hopes of putting product in front of consumers who might happen to walk by in search of any book on a given subject. Our marketing strategy is to build awareness of our brand and products in the core developer and user communities, who then buy directly or drive traffic to retail outlets. The former strategy pushes product into distribution channels in an aggressive bid to reach unknown consumers; the latter pulls products into distribution channels as they are requested by consumers who are already looking for the product. Both companies are extremely successful, but our different business models require different competencies. I won't say more lest this chapter turn into a lesson for O'Reilly competitors, but hopefully I have said enough to get the idea across.
Boiling all the elements of your business down into a one-page picture is a really useful exercise. But what is even more useful is that Dan and Meredith have you run the exercise twice, once to describe your present business, and once to describe it as you want it to be.
At any rate, fresh from the strategic planning process at O'Reilly, it struck me that an adaptation of this idea would be useful preparation for the summit. We weren't modeling a single business but a technology space—the key projects, concepts, and messages associated with it.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: The Cornucopia of the Commons
Dan Bricklin, Cocreator of Visicalc
Let's get to the bottom of the Napster phenomenon—why is this music trading service so popular? One could say, trivially, that Napster is successful because you can find what you want (a particular song) and get it easily. It's also pretty obvious that songs are easy to find because so many of them are available through Napster. If Napster let me get only a few popular songs, once I downloaded those I'd lose interest fast.
But what's the root cause? Why are so many songs available? Hint: It has nothing to do with peer-to-peer. Peer-to-peer is plumbing, and most people don't care about plumbing. While the "look into other people's computers and copy directly" method has some psychological benefit to people who understand what's going on (as indicated by thinkers such as Tom Matrullo and Dave Winer), I think the peer-to-peer aspects actually get in the way of Napster.
Let's be blunt: Napster would operate much better if, when you logged in, it uploaded all the songs from your disk that weren't already in the Napster database. If the songs were copied to a master server, rather than just the names of the songs and who was currently logged in, the same songs would be available for download provided by the same people, but at all times (not just when the "owner" happened to be connected to the Internet), and probably through more reliable and higher-speed connections to the Internet. (Akamai provides the kind of redundancy and efficiency that Napster currently relies on its worldwide network of users to provide.) Napster could at least maintain the list of who has what songs better than they do now.
Napster doesn't work this way partly because peer-to-peer may be more legal (or so they argue) and harder to litigate against. But other applications may not have Napster's legal problems and would therefore benefit from more centralized servers. While I'm a strong proponent for peer-to-peer for some things, I don't think architecture is the main issue driving new services.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Ways to fill shared databases
There are three common ways to fill a shared database: organized manual, organized mechanical, and volunteer manual.
The classic case of an organized manual database is the original Yahoo! directory. This database was filled by organizing an army of people to put in data manually. Another example is the old legal databases where armies of typists were paid to retype printed material into computers.
The original AltaVista is an example of an organized mechanical database. A program running on powerful computers followed links and domain names and spidered the Web, saving the information as it went. Many databases on the Web today are mechanically created by getting access to somebody else's data, sometimes for a fee. Examples include databases of street maps and the status of airline flights. Some of those databases are by-products of automated processes.
Finally, Usenet newsgroups and threaded discussions like Slashdot are examples of volunteer databases, where interested individuals provide the data because they feel passionate enough about doing so. Amazon.com's well-known reviews are created through a mixture of organized manual and volunteer manual techniques: the company recruits some reviews and readers spontaneously put up others.
The most interesting databases (for the purposes of this chapter) are the ones that involve manual creation. When we look closely at some of them, we find some very clever techniques for getting data that are very specific to the subjects they cover and the users they serve. Let's focus on one service that employs a very unusual technique to aggregate its data: the CDDB service offered by Gracenote to organize information about music CDs (http://www.cddb.com).
The CDDB database has information that allows your computer to identify a particular music CD in the CD drive and list its album title and track titles. Their service is used by RealJukebox, MusicMatch, Winamp, and others. What's interesting is how they accumulate this information that so many users rely on without even thinking about it.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 5: SETI@home
David Anderson, SETI@home
It was January 1986, and I was sitting in a cafe on Berkeley, California's Telegraph Avenue. Looking up, I recognized a student in the graduate course I was teaching that semester at the university. We talked. His name was David Gedye, and he had just arrived from Australia. Our conversation revealed many common interests, both within and outside of computer science. This chance meeting led, twelve years later, to a project that may revolutionize computing and science: SETI@home.
Gedye and I became running partners. Our long forays into the hills above the Berkeley campus occasioned many far-ranging discussions about the universe and our imperfect understanding of it. I enjoyed these times. But all good things must end, and in 1989 Gedye left Berkeley with a master's degree. He worked in Silicon Valley for a few years, then moved to Seattle and started a family. I also left academia, but remained in the Bay Area.
In 1995 Gedye visited me in Berkeley, and we returned to the hills, this time for a leisurely walk. He was bursting with excitement about a new idea. It sounded crazy at first: He proposed using the computing power of home PCs to search for radio signals from extraterrestrial civilizations. But Gedye was serious. He had contacted Woody Sullivan, an astronomy professor at the University of Washington and an expert in the theory behind SETI, the Search for Extraterrestrial Intelligence. Woody had steered him to Dan Werthimer, a SETI researcher at UC Berkeley.
The four of us—Gedye, Werthimer, Sullivan, and I—met several times over the next year, trying to assess the viability of Gedye's idea. We decided that existing technology was sufficient, though just barely, for recording radio data and distributing it over the Internet. And if we managed to get 100,000 people to participate, the aggregate computing power would let us search for fainter signals, and more types of signals, than had ever been done before. But could we get that many people interested? We decided to try it and find out.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Radio SETI
SETI is a scientific research area whose goal is to detect intelligent life outside the Earth. In 1959, Phil Morrison and Giuseppe Cocconi proposed listening for signals with narrow frequency bandwidth, like our own television and radar emissions, but unlike the noise emanating from stars and other natural sources. Such signals would be evidence of technology, and therefore of life.
The first radio SETI experiment was conducted in 1960 by Frank Drake, who pointed an 85-foot radio telescope in West Virginia at two nearby stars. Drake didn't detect an extraterrestrial signal, but he and other researchers have continued to listen. Since 1960 there have been tremendous advances in technology, especially in the digital technology at the heart of radio SETI. The systems that analyze radio signals use the Fast Fourier Transform (FFT), an algorithm that divides signals into their component frequencies. Most SETI projects have built special-purpose FFT supercomputers, but are limited to fairly simple types of analysis.
There are also larger and more sensitive radio telescopes. The largest is Arecibo, a 1,000-foot aluminum dish set into a natural hollow in the hills of northern Puerto Rico. A movable antenna platform is suspended 700 feet above the center of the dish. By moving the antenna, one can effectively point the telescope anywhere in a band of sky from the celestial equator to 38 degrees north. The telescope doesn't form an image like optical telescopes. It's more like a highly directional microphone. It sees a fuzzy disk (a beam) about 1⁄10 of a degree in diameter, or about 1⁄5 the diameter of the moon.
Arecibo's size and excellent electronics let it hear very faint signals. The telescope is used for many scientific purposes: looking for pulsars, imaging asteroids and planets by bouncing radio waves off them, and studying the upper atmosphere. Observation time on Arecibo is a precious commodity.
In 1992, Dan Werthimer devised a way for his SETI project, SERENDIP, to use Arecibo all the time—even while other projects are using it. He mounted a secondary antenna at the opposite end of the platform from the main antenna. While the main antenna tracks a fixed point in the sky (as it normally does) this secondary antenna moves slowly in an arc about 6 degrees away. SERENDIP observers have no control over where the scope points, but over long periods of time their beam covers the entire band of sky visible from Arecibo. SERENDIP is thus a
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
How SETI@home works
We decided that SETI@home would use SERENDIP's antenna. Like all previous radio SETI projects, SERENDIP analyzes its signal using a dedicated supercomputer at the telescope; it doesn't record the signal. For SETI@home, we needed to digitally record the signal and transport it to our computers at Berkeley. The network connection from Arecibo to the mainland is too slow. Instead, we record the data on digital tapes and mail them to Berkeley. The largest-capacity digital tape available in 1998 was the 35-GB digital linear tape (DLT).
We had to decide what frequency range to record. Covering a wide range is good from a scientific point of view, but it means more tapes and more network bandwidth. We decided to record a 2.5 MHz frequency band. Using 1-bit samples, this gives a data rate of 5 Mbps, meaning that a tape fills up in about 16 hours. Like most radio SETI projects, we centered our band at the hydrogen line, 1.42 GHz. This is the resonant frequency of the hydrogen molecules that fill interstellar space. Since hydrogen is the most abundant element in the universe, we hope that if aliens are sending an intentional signal, they will use this frequency. Our 2.5 MHz band is wide enough to contain Doppler shifts (frequency shifts due to relative motion) corresponding to any likely velocity of a transmitter in our galaxy.
SETI@home and SERENDIP are complementary: SETI@home looks at a narrower frequency range than SERENDIP (2.5 MHz versus 140 MHz) but does better signal analysis. SETI@home will record data for two years, during which time we'll cover Arecibo's visible band about four times.
Every week about ten newly-recorded tapes arrive from Arecibo. These tapes are catalogued and stored. Next, the data is divided into work units, the pieces that are sent to clients. The data is divided along two dimensions: time and frequency. We decided that work units should be about 0.3 MB—large enough to keep a computer busy for a while, but small enough so that, even over a 28.8- Kbps modem, the transmission time is only a few minutes. We wanted each work unit to cover several times the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Trials and tribulations
SETI@home has faced many difficulties and challenges. Server performance, for example, has been a major problem. As more and more people downloaded and ran the client, the stream of client requests grew from a trickle to a torrent. At first, our server system consisted of three pieces: an Informix database server, the data distribution server, and an Apache web server. These ran on three Sun workstations, which also served as our personal computers.
In the first week the server system quickly was overwhelmed. Client connections were being turned away, resulting in irritating error messages being displayed to users, and hence a torrent of email.
We scrambled to fix these problems by modifying the software. For example, we realized that much of the load on the database server was due to updating lots of accounting records (for countries, CPU types, teams, etc.) for each result received. We hastily revised the system to update the accounting records off-line, combining thousands of database writes into a single write. This offline system quickly fell behind, producing yet another wave of irate email, but at least the data distribution server now kept up.
It quickly became clear that we needed more powerful server hardware. Sun Microsystems came to our rescue, and over the next year they donated several of their high-performance server machines. Even with these improvements, server performance continues to be an issue. Resources in general, especially funding and manpower, have been a problem. We've received funds from a variety of private donors and a grant from the University of California. This money has been enough to hire about three full-time employees. A project of similar magnitude in the private sector would probably employ 20 or 30 people. We've had to cut corners in many areas (for example, there is no customer support), and some tasks have fallen far behind schedule.
Another problem area involved processor-specific optimizations. The SETI@home client is written in C++, and we compile it using standard compilers such as Microsoft VC++ and Gnu's
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Human factors
Early in 1998, we launched a SETI@home web site describing the idea and letting people sign up. It was a good time to start a project like SETI@home. Public interest in SETI had been stirred by the movie Contact, which was released in July 1997. This movie, based on a novel by Carl Sagan, describes radio SETI in reasonably accurate terms, and parts of it were filmed at Arecibo.
It became clear that there would be no shortage of participants—over 400,000 people signed up at the web site. After a long period of development and testing, we released the client software on May 17, 1999. In the first week after the launch, over 200,000 people downloaded and ran the client. This number has grown to 2,400,000 as of October 2000. People in 226 countries around the world run SETI@home. 50% of them are outside the U.S.; there are even 73 in Antarctica.
People have helped SETI@home in every way imaginable. People upgrade their computers, or buy new computers, just to run SETI@home faster. In Europe, people run SETI@home in spite of expensive Internet connection setup charges. Volunteers translated the web site into about 30 foreign languages. A number of people have written programs that track their work in elaborate detail. Graphic artists sent us dozens of banner and link graphics; one of these was so attractive that it replaced Gedye's original planet-and-wave image (which he threw together in PowerPoint) as our logo.
When it became clear that SETI@home was being widely embraced by the public, several questions arose. How was the word about SETI@home being spread? Why were people running SETI@home? Were they leaving their computers on longer, or buying faster computers, because of SETI@home?
We've heard the following "viral marketing" scenario from many sources: one person in an office starts running SETI@home; people see the screensaver graphics, ask about it, hear the explanation of the project, and try it themselves. Soon the entire office is running it.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The world's most powerful computer
Scientific computations are often measured in units of floating-point operations —additions and multiplications of numbers with fractional parts, like 42.0 or 3.14159. A common unit of supercomputer speed is trillions of floating-point operations per second, or TFLOPS.
The 1.0 TFLOPS barrier has been broken only in the last year or so. The fastest supercomputer is currently the ASCI White, built by IBM for the U.S. Department of Energy. It costs $110 million, weighs 106 tons, and has a peak performance of 12.3 TFLOPS.
SETI@home is faster than ASCI White, at less than 1% of the cost. The FFT computations for each SETI@home work unit require 3.1 trillion floating-point operations. In a typical day, SETI@home clients process about 700,000 work units. This works out to over 20 TFLOPS. It has cost about $500,000, plus another $200,000 or so in donated hardware, to develop SETI@home and operate it for a year. Of course, the cost of the one million PCs running SETI@home greatly exceeds that of ASCI White—but these PCs were bought and paid for before SETI@home and would exist even without it.
As of October 2000, SETI@home has received 200 million results, for a total of 4 × 1020 floating-point operations. We believe that this is the largest computation ever performed. And in terms of the potential of the Internet for scientific computing, SETI@home is the tip of the iceberg. There are projected to be one billion Internet-connected computers by 2003. If 10% of them participate in distributed computing projects, there will be enough computing power for 100 projects the size of SETI@home.
To what range of problems is this power applicable? Certainly not all problems. It must be possible to factor the problem into a large number of pieces that can be handled in parallel, with few or no interdependencies between the pieces. The ratio between communication and computation must be fairly low: for example, it mustn't take an hour to transfer the data for one second of computing.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The peer-to-peer paradigm
In the brief history of computer technology, there have been several stages in the way computer systems are structured. The dominant paradigm today is called client/server: Information is concentrated in centrally located server computers and distributed through networks to client computers that act primarily as user interface devices. Client/server is a successor to the earlier desktop computing and mainframe paradigms.
Today's typical personal computer has a very fast processor, lots of unused disk space, and the ability to send data on the Internet—the same capabilities required of server computers. The sheer quantity of Internet-connected computers suggests a new paradigm in which tasks currently handled by central servers (such as supercomputing and data serving) are spread across large numbers of personal computers. In effect, the personal computer acts as both client and server. This new paradigm has been dubbed peer-to-peer (P2P). SETI@home and Napster (a program, released about the same time as SETI@home, that allows people to share sound files over the Internet) are often cited as the first major examples of P2P systems.
The huge number of computers participating in a P2P system can overcome the fact that individual computers may be only sporadically available (i.e., their owners may turn them off or disconnect them from the Internet). Software techniques such as data replication can combine a large number of slow, unreliable components into a fast, highly reliable system.
The P2P paradigm has a human as well as a technical side—it shifts power, and therefore control, away from organizations and toward individuals. This might lead, for example, to a music distribution system that efficiently matches musicians and listeners, eliminating the dilution and homogenization of mass marketing. For scientific computing, it could contribute to a democratization of science: a research project that needs massive supercomputing will have to explain its research to the public and argue the merit of the research. This, I believe, is a worthwhile goal and will be a significant accomplishment for SETI@home even if no extraterrestrial signal is found.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 6: Jabber: Conversational Technologies
Jeremie Miller, Jabber
Conversations are an important part of our daily lives. For most people, in fact, they are the most important way to acquire and spread knowledge during a normal working day.
Conversations provide a comfortable medium in which knowledge flows in both directions, and where contributors share an inherent context through their subjects and relationships. In addition to old forms of conversations—direct interaction and communication over the phone and in person—conversations are becoming an increasingly important part of the networked world. Witness the popularity of email, chat, and instant messaging, which enable users to increase the range and scope of their conversations to reach those that they may not have before.
Still, little attention has been paid in recent years to the popular Internet channels that most naturally support conversations. Instead, most people see the Web as the driving force, and they view it as a content delivery platform rather than as a place for exchanges among equals. The dominance of the Web has come about because it has succeeded in becoming a fundamentally unifying technology that provides access to content in all forms and formats. However, it tends toward being a traditional one-way broadcast medium, with the largest base of users being passive recipients of content.
Conversations have a stubborn way of reemerging in any human activity, however. Recently, much of the excitement and buzz around the Web have centered on sites that use it as a conversational medium. These conversations take place within a particular web site (Slashdot, eBay, Amazon.com) or an application (Napster, AIM/ICQ, Netshow).
And repeating the history of the pre-Web Internet, the new conversations sprout up in a disjointed, chaotic variety where the left hand doesn't know what the right hand is doing. The Web was a godsend for lowering the barrier to access information; it increased the value of all content by unifying the technologies that described and delivered that content. In the same way, Internet conversations stand to benefit significantly by the introduction of a common platform designed to support the rich dynamic and flexible nature of a conversation.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Conversations and peers
Content preview·