BUY THIS BOOK
Add to Cart

Print Book $24.95


Add to Cart

Print+PDF $32.44

Add to Cart

PDF $19.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £17.50

What is this?

Looking to Reprint or License this content?


Producing Open Source Software
Producing Open Source Software How to Run a Successful Free Software Project

By Karl Fogel
Book Price: $24.95 USD
£17.50 GBP
PDF Price: $19.99

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Introduction
Most free software projects fail.
We tend not to hear very much about the failures. Only successful projects attract attention, and there are so many free software projects in total that even though only a small percentage succeed, the result is still a lot of visible projects. We also don't hear about the failures because failure is not an event. There is no single moment when a project ceases to be viable; people just sort of drift away and stop working on it. There may be a moment when a final change is made to the project, but those who made it usually didn't know at the time that it was the last one. There is not even a clear definition of when a project is expired. Is it when it hasn't been actively worked on for six months? When its user base stops growing, without having exceeded the developer base? What if the developers of one project abandon it because they realized they were duplicating the work of another—and what if they join that other project, then expand it to include much of their earlier effort? Did the first project end, or just change homes?
Because of such complexities, it's impossible to put a precise number on the failure rate. But anecdotal evidence from over a decade in open source, some casting around on SourceForge.net, and a little Googling all point to the same conclusion: the rate is extremely high, probably on the order of 90-95%. The number climbs higher if you include surviving but dysfunctional projects: those which are producing running code, but which are not pleasant places to be, or are not making progress as quickly or as dependably as they could.
This book is about avoiding failure. It examines not only how to do things right, but how to do them wrong, so you can recognize and correct problems early. My hope is that after reading it, you will have a repertory of techniques not just for avoiding common pitfalls of open source development, but also for dealing with the growth and maintenance of a successful project. Success is not a zero-sum game, and this book is not about winning or getting ahead of the competition. Indeed, an important part of running an open source project is working smoothly with other, related projects. In the long run, every successful project contributes to the well-being of the overall, worldwide body of free software.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
History
Software sharing has been around as long as software itself. In the early days of computers, manufacturers felt that competitive advantages were to be had mainly in hardware innovation, and therefore didn't pay much attention to software as a business asset. Many of the customers for these early machines were scientists or technicians, who were able to modify and extend the software shipped with the machine themselves. Customers sometimes distributed their patches back not only to the manufacturer, but to other owners of similar machines. The manufacturers often tolerated and even encouraged this: in their eyes, improvements to the software, from whatever source, just made the machine more attractive to other potential customers.
Although this early period resembled today's free software culture in many ways, it differed in two crucial respects. First, there was as yet little standardization of hardware—it was a time of flourishing innovation in computer design, but the diversity of computing architectures meant that everything was incompatible with everything else. Thus, software written for one machine would generally not work on another. Programmers tended to acquire expertise in a particular architecture or family of architectures (whereas today they would be more likely to acquire expertise in a programming language or family of languages, confident that their expertise will be transferable to whatever computing hardware they happen to find themselves working with). Because a person's expertise tended to be specific to one kind of computer, their accumulation of expertise had the effect of making that computer more attractive to them and their colleagues. It was therefore in the manufacturer's interests for machine-specific code and knowledge to spread as widely as possible.
Second, there was no Internet. Though there were fewer legal restrictions on sharing than today, there were more technical ones: the means of getting data from place to place were inconvenient and cumbersome, relatively speaking. There were some small, local networks, good for sharing information among employees at the same research lab or company. But there remained barriers to overcome if one wanted to share with everyone, no matter where they were. These barriers
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Situation Today
When running a free software project, you won't need to talk about such weighty philosophical matters on a daily basis. Programmers will not insist that everyone else in the project agree with their views on all things (those who do insist on this quickly find themselves unable to work on any project). But you do need to be aware that the question of "free" versus "open source" exists, partly to avoid saying things that might be inimical to some of the participants, and partly because understanding developers' motivations is the best way—in some sense, the only way—to manage a project.
Free software is a culture by choice. To operate successfully in it, you have to understand why people choose to be in it in the first place. Coercive techniques don't work. If people are unhappy in one project, they will just wander off to another one. Free software is remarkable even among volunteer communities for its lightness of investment. Most of the people involved have never actually met the other participants face-to-face, and simply donate bits of time whenever they feel like it. The normal conduits by which humans bond with each other and form lasting groups are narrowed down to a tiny channel: the written word, carried over electronic wires. Because of this, it can take a long time for a cohesive and dedicated group to form. Conversely, it's quite easy for a project to lose a potential volunteer in the first five minutes of acquaintanceship. If a project doesn't make a good first impression, newcomers rarely give it a second chance.
The transience, or rather the potential transience, of relationships is perhaps the single most daunting task facing a new project. What will persuade all these people to stick together long enough to produce something useful? The answer to that question is complex enough to occupy the rest of this book, but if it had to be expressed in one sentence, it would be this:
People should feel that their connection to a project, and influence over it, is directly proportional to their contributions.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Getting Started
The classic model of how free software projects get started was supplied by Eric Raymond, in a now-famous paper on open source processes entitled "The Cathedral and the Bazaar." He wrote:
Every good work of software starts by scratching a developer's personal itch.(from http://www.catb.org/~esr/writings/cathedral-bazaar/)
Note that Raymond wasn't saying that open source projects happen only when some individual gets an itch. Rather, he was saying that good software results when the programmer has a personal interest in seeing the problem solved; the relevance of this to free software was that a personal itch happened to be the most frequent motivation for starting a free software project.
This is still how most free projects are started, but less so now than in 1997, when Raymond wrote those words. Today, we have the phenomenon of organizations—including for-profit corporations—starting large, centrally-managed open source projects from scratch. The lone programmer, banging out some code to solve a local problem and then realizing the result has wider applicability, is still the source of much new free software, but is not the only story.
Raymond's point is still insightful, however. The essential condition is that the producers of the software have a direct interest in its success, because they use it themselves. If the software doesn't do what it's supposed to do, the person or organization producing it will feel the dissatisfaction in their daily work. For example, the OpenAdapter project (http://www.openadapter.org/), which was started by investment bank Dresdner Kleinwort Wasserstein as an open source framework for integrating disparate financial information systems, can hardly be said to scratch any individual programmer's personal itch. It scratches an institutional itch. But that itch arises directly from the experiences of the institution and its partners, and therefore if the project fails to relieve them, they will know. This arrangement produces good software because the feedback loop flows in the right direction. The program isn't being written to be sold to someone else so they can solve
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
First, Look Around
Before starting an open source project, there is one important caveat:
Always look around to see if there's an existing project that does what you want. The chances are pretty good that whatever problem you want solved now, someone else wanted solved before you. If they did solve it, and released their code under a free license, then there's no reason for you to reinvent the wheel today. There are exceptions, of course: if you want to start a project as an educational experience, pre-existing code won't help; or maybe the project you have in mind is so specialized that you know there is zero chance anyone else has done it. But generally, there's no point in not looking, and the payoff can be huge. If the usual Internet search engines don't turn up anything, try searching on http://freshmeat.net/ (an open source project news site, about which more will be said later), on http://www.sourceforge.net/, and in the Free Software Foundation's directory of free software at http://directory.fsf.org/.
Even if you don't find exactly what you were looking for, you might find something so close that it makes more sense to join that project and add functionality than to start from scratch yourself.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Starting from What You Have
You've looked around, found that nothing out there really fits your needs, and decided to start a new project.
What now?
The hardest part about launching a free software project is transforming a private vision into a public one. You or your organization may know perfectly well what you want, but expressing that goal comprehensibly to the world is a fair amount of work. It is essential, however, that you take the time to do it. You and the other founders must decide what the project is really about—that is, decide its limitations, what it won't do as well as what it will—and write up a mission statement. This part is usually not too hard, though it can sometimes reveal unspoken assumptions and even disagreements about the nature of the project, which is fine: better to resolve those now than later. The next step is to package up the project for public consumption, and this is, basically, pure drudgery.
What makes it so laborious is that it consists mainly of organizing and documenting things everyone already knows—"everyone," that is, who's been involved in the project so far. Thus, for the people doing the work, there is no immediate benefit. They do not need a README file giving an overview of the project, nor a design document or user manual. They do not need a carefully arranged code tree conforming to the informal but widespread standards of software source distributions. Whatever way the source code is arranged is fine for them, because they're already accustomed to it anyway, and if the code runs at all, they know how to use it. It doesn't even matter, for them, if the fundamental architectural assumptions of the project remain undocumented; they're already familiar with that too.
Newcomers, on the other hand, need these things. Fortunately, they don't need them all at once. It's not necessary for you to provide every possible resource before taking a project public. In a perfect world, perhaps, every new open source project would start out life with a thorough design document, a complete user manual (with special markings for features planned but not yet implemented), beautifully and portably packaged code, capable of running on any computing platform, and so on. In reality, taking care of all these loose ends would be prohibitively time-consuming, and anyway, it's work that one can reasonably hope volunteers will help with once the project is under way.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Choosing a License and Applying It
This section is intended to be a very quick, very rough guide to choosing a license. Read Chapter 9 to understand the detailed legal implications of the different licenses, and how the license you choose can affect people's ability to mix your software with other free software.
There are a great many free software licenses to choose from. Most of them we needn't consider here, as they were written to satisfy the particular legal needs of some corporation or person, and wouldn't be appropriate for your project. We will restrict ourselves to just the most commonly used licenses; in most cases, you will want to choose one of them.
If you're comfortable with your project's code potentially being used in proprietary programs, then use an MIT/X-style license. It is the simplest of several minimal licenses that do little more than assert nominal copyright (without actually restricting copying) and specify that the code comes with no warranty. See Section 9.4.1 in Chapter 9 for details.
If you don't want your code to be used in proprietary programs, use the GNU General Public License (http://www.gnu.org/licenses/gpl.html). The GPL is probably the most widely recognized free software license in the world today. This is in itself a big advantage, since many potential users and contributors will already be familiar with it, and therefore won't have to spend extra time to read and understand your license. See Section 9.4.2 in Chapter 9 for details.
Once you've chosen a license, you should state it on the project's front page. You don't need to include the actual text of the license there; just give the name of the license, and make it link to the full license text on another page.
This tells the public what license you intend the software to be released under, but it's not sufficient for legal purposes. For that, the software itself must contain the license. The standard way to do this is to put the full license text in a file called COPYING (or LICENSE), and then put a short notice at the top of each source file, naming the copyright date, holder, and license, and saying where to find the full text of the license.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Setting the Tone
So far we've covered one-time tasks you do during project setup: picking a license, arranging the initial web site, etc. But the most important aspects of starting a new project are dynamic. Choosing a mailing list address is easy; ensuring that the list's conversations remain on-topic and productive is another matter entirely. If the project is being opened up after years of closed, in-house development, its development processes will change, and you will have to prepare the existing developers for that change.
The first steps are the hardest, because precedents and expectations for future conduct have not yet been set. Stability in a project does not come from formal policies, but from a shared, hard-to-pin-down collective wisdom that develops over time. There are often written rules as well, but they tend to be essentially a distillation of the intangible, ever-evolving agreements that really guide the project. The written policies do not define the project's culture so much as describe it, and even then only approximately.
There are a few reasons why things work out this way. Growth and high turnover are not as damaging to the accumulation of social norms as one might think. As long as change does not happen too quickly, there is time for new arrivals to learn how things are done, and after they learn, they will help reinforce those ways themselves. Consider how children's songs survive the centuries. There are children today singing roughly the same rhymes as children did hundreds of years ago, even though there are no children alive now who were alive then. Younger children hear the songs sung by older ones, and when they are older, they in turn will sing them in front of other younger ones. The children are not engaging in a conscious program of transmission, of course, but the reason the songs survive is nonetheless that they are transmitted regularly and repeatedly. The time scale of free software projects may not be measured in centuries (we don't know yet), but the dynamics of transmission are much the same. The turnover rate is faster, however, and must be compensated for by a more active and deliberate transmission effort.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Announcing
Once the project is presentable—not perfect, just presentable—you're ready to announce it to the world. This is actually a very simple process: go to http://freshmeat.net/, click on Submit in the top navigation bar, and fill out a form announcing your new project. Freshmeat is the place everyone watches for new project announcements. You only have to catch a few eyes there for news of your project to spread by word of mouth.
If you know of mailing lists or newsgroups where an announcement of your project would be on-topic and of interest, then post there, but be careful to make exactly one post per forum, and to direct people to your project's own forums for follow-up discussion (by setting the Reply-to header). The posts should be short and get right to the point:
To: discuss@lists.example.org
Subject: [ANN] Scanley full-text indexer project
Reply-to: dev@scanley.org

This is a one-time post to announce the creation of the Scanley
project, an open source full-text indexer and search engine with a
rich API, for use by programmers in providing search services for
large collections of text files.  Scanley is now running code, is
under active development, and is looking for both developers and
testers.

Home page: http://www.scanley.org/

Features:
   - Searches plain text, HTML, and XML
   - Word or phrase searching
   - (planned) Fuzzy matching
   - (planned) Incremental updating of indexes
   - (planned) Indexing of remote web sites

Requirements:
   - Python 2.2 or higher
   - Enough disk space to hold the indexes (approximately twice
     original data size) 

For more information, please come to scanley.org.

Thank you,
-J. Random
(See Section 6.6 in Chapter 6 for advice on announcing further releases and other project events.)
There is an ongoing debate in the free software world about whether it is necessary to begin with running code, or whether a project can benefit from being opened even during the design/discussion stage. I used to think starting with running code was the most important factor, that it was what separated successful projects from toys, and that serious developers would be attracted only to software that did something concrete already.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Technical Infrastructure
Free software projects rely on technologies that support the selective capture and integration of information. The more skilled you are at using these technologies, and at persuading others to use them, the more successful your project will be. This only becomes more true as the project grows. Good information management is what prevents open source projects from collapsing under the weight of Brooks' Law, which states that adding manpower to a late software project makes it later. Fred Brooks observed that the complexity of a project increases as the square of the number of participants. When only a few people are involved, everyone can easily talk to everyone else, but when hundreds of people are involved, it is no longer possible for each person to remain constantly aware of what everyone else is doing. If good free software project management is about making everyone feel like they're all working together in the same room, the obvious question is: what happens when everyone in a crowded room tries to talk at once?
This problem is not new. In non-metaphorical crowded rooms, the solution is parliamentary procedure: formal guidelines for how to have real-time discussions in large groups, how to make sure important dissents are not lost in floods of "me-too" comments, how to form subcommittees, how to recognize when decisions are made, etc. An important part of parliamentary procedure is specifying how the group interacts with its information management system. Some remarks are made "for the record," others are not. The record itself is subject to direct manipulation, and is understood to be not a literal transcript of what occurred, but a representation of what the group is willing to agree occurred. The record is not monolithic, but takes different forms for different purposes. It comprises the minutes of individual meetings, the complete collection of all minutes of all meetings, summaries, agendas and their annotations, committee reports, reports from correspondents not present, lists of action items, etc.
Because the Internet is not really a room, we don't have to worry about replicating those parts of parliamentary procedure that keep some people quiet while others are speaking. But when it comes to information management techniques, well-run open source projects are parliamentary procedure on steroids. Since almost all communication in open source projects happens in writing, elaborate systems have evolved for routing and labeling data appropriately; for minimizing repetitions so as to avoid spurious divergences; for storing and retrieving data; for correcting bad or obsolete information; and for associating disparate bits of information with each other as new connections are observed. Active participants in open source projects internalize many of these techniques, and will often perform complex manual tasks to ensure that information is routed correctly. But the whole endeavor ultimately depends on sophisticated software support. As much as possible, the communications media themselves should do the routing, labeling, and recording, and should make the information available to humans in the most convenient way possible. In practice, of course, humans will still need to intervene at many points in the process, and it's important that the software make such interventions convenient too. But in general, if the humans take care to label and route information accurately on its first entry into the system, then the software should be configured to make as much use of that metadata as possible.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
What a Project Needs
Most open source projects offer at least a minimum, standard set of tools for managing information:
Web site
Primarily a centralized, one-way conduit of information from the project out to the public. The web site may also serve as an administrative interface for other project tools.
Mailing lists
Usually the most active communications forum in the project, and the "medium of record."
Version control
Enables developers to manage code changes conveniently, including reverting and "change porting." Enables everyone to watch what's happening to the code.
Bug tracking
Enables developers to keep track of what they're working on, coordinate with each other, and plan releases. Enables everyone to query the status of bugs and record information (e.g., reproduction recipes) about particular bugs. Can be used for tracking not only bugs, but also tasks, releases, new features, etc.
Real-time chat
A place for quick, lightweight discussions and question/answer exchanges. Not always archived completely.
Each tool in this set addresses a distinct need, but their functions are also interrelated, and the tools must be made to work together. Below we will examine how they can do so, and more importantly, how to get people to use them. The web site is not discussed until the end, since it acts more as glue for the other components than as a tool unto itself.
You may be able to avoid a lot of the headache of choosing and configuring these tools by using a canned hosting site: a server that offers prepackaged, templatized web areas with all the accompanying tools needed to run a free software project. See Section 3.7.1 later in this chapter for a discussion of the advantages and disadvantages of canned hosting.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Mailing Lists
Mailing lists are the bread and butter of project communications. If a user is exposed to any forum besides the web pages, it is most likely to be one of the project's mailing lists. But before she experiences the mailing list itself, she will experience the mailing list interface—that is, the mechanism by which she joins ("subscribes to") the list. This brings us to Rule #1 of mailing lists:
Don't try to manage mailing lists by hand—get list management software.
It will be tempting to put this off. Setting up mailing list management software might seem like overkill at first. Managing small, low-traffic lists by hand will seem seductively easy: you just set up a subscription address that forwards to you, and when someone mails it, you add (or remove) their email address in some text file that holds all the addresses on the list. What could be simpler?
The trick is that good mailing list management—which is what people have come to expect—is not simple at all. It's not just about subscribing and unsubscribing users when they request. It's also about moderating to prevent spam, offering the mailing list in digest versus message-by-message form, providing standard list and project information by means of auto-responders, and various other things. A human being monitoring a subscription address can supply only a bare minimum of functionality, and even then not as reliably and promptly as software could.
Modern list management software usually offers at least the following features:
Both email- and web-based subscription
When a user subscribes to a list, she should promptly get an automated welcome message in reply, telling her what she has subscribed to, how to interact further with the mailing list software, and (most importantly) how to unsubscribe. This automatic reply can be customized to contain project-specific information, of course, such as the project's web site, FAQ location, etc.
Subscription in either digest mode or message-by-message mode
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Version Control
A version control system (or revision control system) is a combination of technologies and practices for tracking and controlling changes to a project's files, in particular to source code, documentation, and web pages. If you have never used version control before, the first thing you should do is go find someone who has, and get them to join your project. These days, everyone will expect at least your project's source code to be under version control, and probably will not take the project seriously if it doesn't use version control with at least minimal competence.
The reason version control is so universal is that it helps with virtually every aspect of running a project: interdeveloper communications, release management, bug management, code stability and experimental development efforts, and attribution and authorization of changes by particular developers. The version control system provides a central coordinating force among all of these areas. The core of version control is change management: identifying each discrete change made to the project's files, annotating each change with metadata like the change's date and author, and then replaying these facts to whoever asks, in whatever way they ask. It is a communications mechanism where a change is the basic unit of information.
This section does not discuss all aspects of using a version control system. It's so all-encompassing that it must be addressed topically throughout the book. Here, we will concentrate on choosing and setting up a version control system in a way that will foster cooperative development down the road.
This book cannot teach you how to use version control if you've never used it before, but it would be impossible to discuss the subject without a few key terms. These terms are useful independently of any particular version control system: they are the basic nouns and verbs of networked collaboration, and will be used generically throughout the rest of this book. Even if there were no version control systems in the world, the problem of change management would remain, and these words give us a language for talking about that problem concisely.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Bug Tracker
Bug tracking is a broad topic; various aspects of it are discussed throughout this book. Here I'll try to concentrate mainly on setup and technical considerations, but to get to those, we have to start with a policy question: exactly what kind of information should be kept in a bug tracker?
The term bug tracker is misleading. Bug tracking systems are also frequently used to track new feature requests, one-time tasks, unsolicited patches—really anything that has distinct beginning and end states, with optional transition states in between, and that accrues information over its lifetime. For this reason, bug trackers are also called issue trackers, defect trackers, artifact trackers, request trackers, trouble ticket systems, etc. See Appendix B for a list of software.
In this book, I'll continue to use bug tracker for the software that does the tracking, because that's what most people call it, but will use issue to refer to a single item in the bug tracker's database. This allows us to distinguish between the behavior or misbehavior that the user encountered (that is, the bug itself), and the tracker's record of the bug's discovery, diagnosis, and eventual resolution. Keep in mind that although most issues are about actual bugs, issues can be used to track other kinds of tasks too.
The classic issue life cycle looks like this:
  1. Someone files the issue. She provides a summary, an initial description (including a reproduction recipe, if applicable; see Section 8.1.5 in Chapter 8 for how to encourage good bug reports), and whatever other information the tracker asks for. The person who files the issue may be totally unknown to the project—bug reports and feature requests are as likely to come from the user community as from the developers.
    Once filed, the issue is in what's called an open state. Because no action has been taken yet, some trackers also label it as unverified and/or unstarted. It is not assigned to anyone; or, in some systems, it is assigned to a fake user to represent the lack of real assignation. At this point, it is in a holding area: the issue has been recorded, but not yet integrated into the project's consciousness.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
IRC/Real-Time Chat Systems
Many projects offer real-time chat rooms using Internet Relay Chat (IRC), forums where users and developers can ask each other questions and get instant responses. While you can run an IRC server from your own website, it is generally not worth the hassle. Instead, do what everyone else does: run your IRC channels at Freenode (http://freenode.net/). Freenode gives you the control you need to administer your project's IRC channels, while sparing you the not-insignificant trouble of maintaining an IRC server yourself.
The first thing to do is choose a channel name. The most obvious choice is the name of your project—if that's available at Freenode, then use it. If not, try to choose something as close to your project's name, and as easy to remember, as possible. Advertise the channel's availability from your project's web site, so a visitor with a quick question will see it right away. For example, this appears in a prominently placed box at the top of Subversion's home page:
If you're using Subversion, we recommend that you join the users@subversion.tigris.org mailing list, and read the Subversion Book (http://svnbook.red-bean.com/) and FAQ (http://subversion.tigris.org/faq.html). You can also ask questions on IRC at irc.freenode.net channel #svn.
Some projects have multiple channels, one per subtopic. For example, one channel for installation problems, another for usage questions, another for development chat, etc. (Section 6.4 in Chapter 6 discusses how to divide into multiple channels). When your project is young, there should only be one channel, with everyone talking together. Later, as the user-to-developer ratio increases, separate channels may become necessary.
How will people know all the available channels, let alone which channel to talk in? And when they talk, how will they know what the local conventions are?
The answer is to tell them by setting the channel topic. The channel topic is a brief message each user sees when they first enter the channel. It gives quick guidance to newcomers, and pointers to further information. For example:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Wikis
A wiki is a web site that allows any visitor to edit or extend its content; the term "wiki" (from a Hawaiian word meaning "quick" or "super-fast") is also used to refer to the software that enables such editing. Wikis were invented in 1995, but their popularity has really started to take off since 2000 or 2001, boosted partly by the success of Wikipiedia (http://www.wikipedia.org/), a wiki-based free-content encyclopedia. Think of a wiki as falling somewhere between IRC and web pages: wikis don't happen in real time, so people get a chance to ponder and polish their contributions, but they are also very easy to add to, involving less interface overhead than editing a regular web page.
Wikis are not yet standard equipment for open source projects, but they probably will be soon. As they are relatively new technology, and people are still experimenting with different ways of using them, I will just offer a few words of caution here—at this stage, it's easier to analyze misuses of wikis than to analyze their successes.
If you decide to run a wiki, put a lot of effort into having a clear page organization and pleasing visual layout, so that visitors (i.e., potential editors) will instinctively know how to fit in their contributions. Equally important, post those standards on the wiki itself, so people have somewhere to go for guidance. Too often, wiki administrators fall victim to the fantasy that because hordes of visitors are individually adding high quality content to the site, the sum of all these contributions must therefore also be of high quality. That's not how web sites work. Each individual page or paragraph may be good when considered by itself, but it will not be good if embedded in a disorganized or confusing whole. Too often, wikis suffer from:
Lack of navigational principles
A well-organized web site makes visitors feel like they know where they are at any time. For example, if the pages are well-designed, people can intuitively tell the difference between a "table of contents" region and a "content" region. Contributors to a wiki will respect such differences too, but only if the differences are present to begin with.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Web Site
There is not much to say about setting up the project web site from a technical point of view: setting up a web server and writing web pages are fairly simple tasks, and most of the important things to say about layout and arrangement were covered in the previous chapter. The web site's main function is to present a clear and welcoming overview of the project, and to bind together the other tools (the version control system, bug tracker, etc). If you don't have the expertise to set up a web server yourself, it's usually not hard to find someone who does and is willing to help out. Nonetheless, to save time and effort, people often prefer to use one of the canned hosting sites.
There are two main advantages to using a canned site. The first is server capacity and bandwidth: their servers are beefy boxes sitting on really fat pipes. No matter how successful your project gets, you're not going to run out of disk space or swamp the network connection. The second advantage is simplicity. They have already chosen a bug tracker, a version control system, a mailing list manager, an archiver, and everything else you need to run a site. They've configured the tools, and are taking care of backups for all the data stored in the tools. You don't need to make many decisions. All you have to do is fill in a form, press a button, and suddenly you've got a project web site.
These are pretty significant benefits. The disadvantage, of course, is that you must accept their choices and configurations, even if something different would be better for your project. Usually canned sites are adjustable within certain narrow parameters, but you will never get the fine-grained control you would have if you set up the site yourself and had full administrative access to the server.
A perfect example of this is the handling of generated files. Certain project web pages may be generated files—for example, there are systems for keeping FAQ data in an easy-to-edit master format, from which HTML, PDF, and other presentation formats can be generated. As explained in Section 3.3.3.1 earlier in this chapter, you wouldn't want to version the generated formats, only the master file. But when your web site is hosted on someone else's server, it may be impossible to set up a custom hook to regenerate the online HTML version of the FAQ whenever the master file is changed. The only workaround is to version the generated formats too, so that they show up on the web site.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Social and Political Infrastructure
The first questions people usually ask about free software are "How does it work? What keeps a project running? Who makes the decisions?" I'm always dissatisfied with bland responses about meritocracy, the spirit of cooperation, code speaking for itself, etc. The fact is, the question is not easy to answer. Meritocracy, cooperation, and running code are all part of it, but they do little to explain how projects actually run on a day-to-day basis, and say nothing about how conflicts are resolved.
This chapter tries to show the structural underpinnings successful projects have in common. I mean "successful" not just in terms of technical quality, but also operational health and survivability. Operational health is the project's ongoing ability to incorporate new code contributions and new developers, and to be responsive to incoming bug reports. Survivability is the project's ability to exist independently of any individual participant or sponsor—think of it as the likelihood that the project would continue even if all of its founding members were to move on to other things. Technical success is not hard to achieve, but without a robust developer base and social foundation, a project may be unable to handle the growth that initial success brings, or the departure of charismatic individuals.
There are various ways to achieve this kind of success. Some involve a formal governance structure, by which debates are resolved, new developers are invited in (and sometimes out), new features planned, and so on. Others involve less formal structure, but more conscious self-restraint, to produce an atmosphere of fairness that people can rely on as a de facto form of governance. Both ways lead to the same result: a sense of institutional permanence, supported by habits and procedures that are well understood by everyone who participates. These features are even more important in self-organizing systems than in centrally controlled ones, because in self-organizing systems, everyone is conscious that a few bad apples can spoil the whole barrel, at least for a while.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Forkability
The indispensable ingredient that binds developers together on a free software project, and makes them willing to compromise when necessary, is the code's forkability: the ability of anyone to take a copy of the source code and use it to start a competing project, known as a fork. The paradoxical thing is that the possibility of forks is usually a much greater force in free software projects than actual forks, which are very rare. Because a fork is bad for everyone (for reasons examined in detail in Section 8.6 in Chapter 8), the more serious the threat of a fork becomes, the more willing people are to compromise to avoid it.
Forks, or rather the potential for forks, are the reason there are no true dictators in free software projects. This may seem like a surprising claim, considering how common it is to hear someone called the "dictator" or "tyrant" in a given open source project. But this kind of tyranny is special, quite different from the conventional understanding of the word. Imagine a king whose subjects could copy his entire kingdom at any time and move to the copy to rule as they see fit. Would not such a king govern very differently from one whose subjects were bound to stay under his rule no matter what he did?
This is why even projects that are not formally organized as democracies are, in practice, democracies when it comes to important decisions. Replicability implies forkability; forkability implies consensus. It may well be that everyone is willing to defer to one leader (the most famous example being Linus Torvalds in Linux kernel development), but this is because they choose to do so, in an entirely non-cynical and non-sinister way. The dictator has no magical hold over the project. A key property of all open source licenses is that they do not give one party more power than any other in deciding how the code can be changed or used. If the dictator were to suddenly start making bad decisions, there would be restlessness, followed eventually by revolt and a fork. Except, of course, things rarely get that far, because the dictator compromises first.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Benevolent Dictators
The benevolent dictator model is exactly what it sounds like: final decision-making authority rests with one person, who by virtue of personality and experience, is expected to use it wisely.
Although "benevolent dictator" (or BD) is the standard term for this role, it would be better to think of it as "community-approved arbitrator" or "judge". Generally, benevolent dictators do not actually make all the decisions, or even most of the decisions. It's unlikely that one person could have enough expertise to make consistently good decisions across all areas of the project, and anyway, quality developers won't stay around unless they have some influence on the project's direction. Therefore, benevolent dictators commonly do not dictate much. Instead, they let things work themselves out through discussion and experimentation whenever possible. They participate in those discussions themselves, but as regular developers, often deferring to an area maintainer who has more expertise. Only when it is clear that no consensus can be reached, and that most of the group wants someone to guide the decision so that development can move on, do they put their foot down and say "This is the way it's going to be." Reluctance to make decisions by fiat is a trait shared by virtually all successful benevolent dictators; it is one of the reasons they manage to keep the role.
Being a BD requires a combination of traits. It needs, first of all, a well-honed sensitivity to one's own influence in the project, which in turn brings self-restraint. In the early stages of a discussion, one should not express opinions and conclusions with so much certainty that others feel like it's pointless to dissent. People must be free to air ideas, even stupid ideas. It is inevitable that the BD will post a stupid idea from time to time too, of course, and therefore the role also requires an ability to recognize and acknowledge when one has made a bad decision—though this is simply a trait that any good developer should have, especially if she stays with the project a long time. But the difference is that the BD can afford to slip from time to time without worrying about long-term damage to her credibility. Developers with less seniority may not feel so secure, so the BD should phrase critiques or contrary decisions with some sensitivity for how much weight her words carry, both technically and psychologically.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Consensus-Based Democracy
As projects get older, they tend to move away from the benevolent dictatorship model and toward more openly democratic systems. This is not necessarily out of dissatisfaction with a particular BD. It's simply that group-based governance is more "evolutionarily stable," to borrow a biological metaphor. Whenever a benevolent dictator steps down, or attempts to spread decision-making responsibility more evenly, it is an opportunity for the group to settle on a new, non-dictatorial system—establish a constitution, as it were. The group may not take this opportunity the first time, or the second, but eventually they will; once they do, the decision is unlikely ever to be reversed. Common sense explains why: if a group of N people were to vest one person with special power, it would mean that N-1 people were each agreeing to decrease their individual influence. People usually don't want to do that. Even if they did, the resulting dictatorship would still be conditional: the group anointed the BD, clearly the group could depose the BD. Therefore, once a project has moved from leadership by a charismatic individual to a more formal, group-based system, it rarely moves back.
The details of how these systems work vary widely, but there are two common elements: one, the group works by consensus most of the time; two, there is a formal voting mechanism to fall back on when consensus cannot be reached.
Consensus merely means an agreement that everyone is willing to live with. It is not an ambiguous state: a group has reached consensus on a given question when someone proposes that consensus has been reached, and no one contradicts the assertion. The person proposing consensus should, of course, state specifically what the consensus is, and what actions would be taken in consequence of it, if they're not obvious.
Most conversation in a project is on technical topics, such as the right way to fix a certain bug, whether or not to add a feature, how strictly to document interfaces, etc. Consensus-based governance works well because it blends seamlessly with the technical discussion itself. By the end of a discussion, there is often general agreement on what course to take. Someone will usually make a concluding post, which is simultaneously a summary of what has been decided and an implicit proposal of consensus. This provides a last chance for someone else to say, "Wait, I didn't agree to that. We need to hash this out some more."
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Writing It All Down
At some point, the number of conventions and agreements floating around in your project may become so great that you need to record it somewhere. In order to give such a document legitimacy, make it clear that it is based on mailing list discussions and on agreements already in effect. As you compose it, refer to the relevant threads in the mailing list archives, and whenever there's a point you're not sure about, ask again. The document should not contain any surprises: it is not the source of the agreements, it is merely a description of them. Of course, if it is successful, people will start citing it as a source of authority in itself but that just means it reflects the overall will of the group accurately.
This is the document alluded to in Section 2.2.9 in Chapter 2. Naturally, when the project is very young, you will have to lay down guidelines without the benefit of a long project history to draw on. But as the development community matures, you can adjust the language to reflect the way things actually turn out.
Don't try to be comprehensive. No document can capture everything people need to know about participating in a project. Many of the conventions a project evolves remain forever unspoken, never mentioned explicitly, yet adhered to by all. Other things are simply too obvious to be mentioned, and would only distract from important but non-obvious material. For example, there's no point writing guidelines like "Be polite and respectful to others on the mailing lists, and don't start flame wars," or "Write clean, readable, bug-free code." Of course these things are desirable, but since there's no conceivable universe in which they might not be desirable, they are not worth mentioning. If people are being rude on the mailing list, or writing buggy code, they're not going to stop just because the project guidelines said to. Such situations need to be dealt with as they arise, not by blanket admonitions to be good. On the other hand, if the project has specific guidelines about how to write good code, such as rules about documenting every API in a certain format, then those guidelines should be written down as completely as possible.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 5: Money
This chapter examines how to bring funding into a free software environment. It is aimed not only at developers who are paid to work on free software projects, but also at their managers, who need to understand the social dynamics of the development environment. In the sections that follow, the addressee ("you") is presumed to be either a paid developer, or one who manages such developers. The advice will often be the same for both; when it's not, the intended audience will be made clear from context.
Corporate funding of free software development is not a new phenomenon. A lot of development has always been informally subsidized. When a system administrator writes a network analysis tool to help him do his job, then posts it online and gets bug fixes and feature contributions from other system administrators, what's happened is that an unofficial consortium has been formed. The consortium's funding comes from the sysadmins' salaries, and its office space and network bandwidth are donated, albeit unknowingly, by the organizations they work for. Those organizations benefit from the investment, of course, although they may not be institutionally aware of it at first.
The difference today is that many of these efforts are being formalized. Corporations have become conscious of the benefits of open source software, and started involving themselves more directly in its development. Developers, too, have come to expect that really important projects will attract at least donations, and possibly even long-term sponsors. While the presence of money has not changed the basic dynamics of free software development, it has greatly changed the scale at which things happen, both in terms of the number of developers and time-per-developer. It has also had effects on how projects are organized, and on how the parties involved in them interact. The issues are not merely about how the money is spent, or how return on investment is measured. They are also about management and process: how can the hierarchical command structures of corporations and the semi-decentralized volunteer communities of free software projects work productively with each other? Will they even agree on what "productively" means?
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Types of Involvement
There are many different reasons open source projects get funded. The items in this list aren't mutually exclusive; often a project's financial backing will result from several, or even all, of these motivations:
Sharing the burden
Separate organizations with related software needs often find themselves duplicating effort, either by redundantly writing similar code in-house, or by purchasing similar products from proprietary vendors. When they realize what's going on, the organizations may pool their resources and create (or join) an open source project tailored to their needs. The advantages are obvious: the costs of development are divided, but the benefits accrue to all. Although this scenario seems most intuitive for non-profits, it can make strategic sense for even for-profit competitors.
Augmenting services
When a company sells services that depend on, or are made more attractive by, particular open source programs, it is naturally in that company's interests to ensure those programs are actively maintained.
Example: CollabNet's (http://www.collab.net/) support of http://subversion.tigris.org/ (disclaimer: that's my day job, but it's also a perfect example of this model).
Supporting hardware sales
The value of computers and computer components is directly related to the amount of software available for them. Hardware vendors—not just whole-machine vendors, but also makers of peripheral devices and microchips—have found that having high-quality free software to run on their hardware is important to customers.
Undermining a competitor
Sometimes companies support a particular open source project as a means of undermining a competitor's product, which may or may not be open source itself. Eating away at a competitor's market share is usually not the sole reason for getting involved with an open source project, but it can be a factor.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Hire for the Long Term
If you're managing programmers on an open source project, keep them there long enough that they acquire both technical and political expertise—a couple of years, at a minimum. Of course, no project, whether open or closed-source, benefits from swapping programmers in and out too often. The need for a newcomer to learn the ropes each time would be a deterrent in any environment. But the penalty is even stronger in open source projects, because outgoing developers take with them not only their knowledge of the code, but also their status in the community and the human relationships they have made there.
The credibility a developer has accumulated cannot be transferred. To pick the most obvious example, an incoming developer can't inherit commit access from an outgoing one (see Section 5.5 later in this chapter), so if the new developer doesn't already have commit access, he will have to submit patches until he does. But commit access is only the most measurable manifestation of lost influence. A long-time developer also knows all the old arguments that have been hashed and rehashed on the discussion lists. A new developer, having no memory of those conversations, may try to raise the topics again, leading to a loss of credibility for your organization; the others might wonder "Can't they remember anything?" A new developer will also have no political feel for the project's personalities, and will not be able to influence development directions as quickly or as smoothly as one who's been around a long time.
Train newcomers through a program of supervised engagement. The new developer should be in direct contact with the public development community from the very first day, starting off with bug fixes and cleanup tasks, so he can learn the code base and acquire a reputation in the community, yet not spark any long and involved design discussions. All the while, one or more experienced developers should be available for questioning, and should be reading every post the newcomer makes to the development lists, even if they're in threads that the experienced developers normally wouldn't pay attention to. This will help the group spot potential rocks before the newcomer runs aground. Private, behind-the-scenes encouragement and pointers can also help a lot, especially if the newcomer is not accustomed to massively parallel peer review of his code.
Additional content appearing in this s