Let’s start our look at standards with three little questions:
How did standards enrich or limit President Barack Obama’s activities during his first day in office?
How could standards have saved the twin children of movie actor Dennis Quaid from serious injury?
Why is a lack of standards making it hard to repair the U.S. Navy’s much-heralded Nimitz nuclear aircraft carrier?
The answers will show why standards are relevant in many situations, and demonstrate the importance of the government using truly open standards in its digital media and processes.
Let’s start with Barack Obama. On his first day in office, he issued two memoranda, one about transparency and open government (see the Appendix A) and another about the Freedom of Information Act (FOIA). Despite the historic importance of these documents, almost nobody would be able to answer a simple question: what brand and model of pen did the president use to sign those memoranda?
Of course, almost nobody knows the answer to this question because nobody cares or needs to care. This brings us to the two really important questions: what conditions make it irrelevant which pen the president uses? And crucially, what conditions could change the situation so that the tools he uses to write or sign a document suddenly matter?
Before answering these questions, let’s look at what happened to actor Dennis Quaid’s children. In November 2007, his two-week-old twins nearly died after being given a drug at 1,000 times the recommended dose for newborns. Later, Quaid asked for “a technological way to track the life-and-death decision making in medicine” since “100,000 people are killed every year because of medical mistakes,” and created the Quaid Foundation to tackle the answer. In another recent story, while being cured of cancer, former U.S. Rep. Billy Tauzin had a very similar problem: he had to fill out the same forms for six months—every time he went to a new hospital or test center—and also had an unnecessary operation because the surgeons didn’t know about earlier operations.
As for the U.S.S. Nimitz, launched in 1972 and considered a hallmark of American military excellence, she’s still in pretty good shape. Which is lucky because some of the technical diagrams that explain how to fix the reactors and other critical systems are blurry when viewed on computer monitors. It turns out that the diagrams were stored in a file format that today’s computer programs do not completely understand. Reassuring, isn’t it?
These examples may seem totally unrelated, but they contain a common link. Every aspect of our existence is managed and mediated by data, documents, and communications that are increasingly digital: your civil rights and the quality of your own life heavily depend on how software is used around you. This data includes almost everything informational, from the critically important (databases, government reports, regulations, TV broadcasts, blueprints, maps, and contracts) to the casual (blog entries, home movies, and music).
Unfortunately, although the technology to handle these documents has made huge advances in the past 100 years, we often use software, or let it be used, in the wrong way. The software we use to manage government documents, the treatment plans of the Quaid children and Rep. Tauzin, and the design specifications of the Nimitz carrier is much less reliable and, in some ways, much less technically sophisticated than the old-fashioned pen Obama used to sign the memoranda.
To really understand the nature of the problem, we need to step back and establish a few simple definitions. All the forms of data I mentioned earlier—which I’ll just refer to as “documents” for the sake of simplicity—are increasingly being created, processed, distributed, and read digitally—but just what is a digit?
A digit is a single character in a numbering system. Internally, computers can generate, recognize, and store only two states: the presence or absence of a small electric charge, called a bit. Consequently, they can represent only two digits, 1 or 0, just like we’d be forced to do if we had only one hand with only one finger. Commands, signals, and data are called digital when they are translated into series of ones and zeros. Normally, the bits are bundled in groups of eight called bytes.
When done right, digitization is good. It reduces every kind of data management to operations on bit sequences, which in turn are easy to manage with computers. If all conceivable kinds of documents (from texts to music, maps, images, and 3D models) can be represented as series of bits, we need only one class of generic, completely interchangeable devices to store them. Back in the twentieth century, we couldn’t save love letters or movies on an LP album, nor could we preserve live music on sheets of paper. Today, instead, flash cards made for digital cameras will store PhD theses, songs, or tax forms without ever noticing that they aren’t photographs. For the same reason, if everything is digital we can get rid of the telephone systems, the TV and radio broadcasting systems, the telegraph, and so forth and employ just one (very large) class of telecom networks to act as bit transporters. The cost and time savings enabled by this approach to information management are so big that the trend toward digitization is unstoppable.
However, digitization has several traps.
Everything we do to make meaning out of bits—to turn a VoIP transmission into our child’s beloved voice, to display a legal document for editing, to check Google Maps for a location—involves a specification that says what each group of bits means and how they should follow one another. Digital documents require complete format specifications to remain usable, now and in the future. For the same reasons, clearly defined rules known as protocols are necessary when bits travel between systems, whether as email or as computer animation.
Theoretically, agreement on file formats and protocols is all that is needed for different computers and software programs to work together, no matter how the data is generated, stored, or transmitted: programs on one remote computer could automatically retrieve data from other computers, process the data in real time, and send the result—for example, the best deal on an airplane ticket—directly to your home computer.
In the real world, legal restrictions and implementation issues impair the value of file formats and communication protocols. Companies can change them unexpectedly and prevent anybody they choose from using their formats by legal means. Where good will prevail, ambiguities can lead to incompatible products. Thus, format and protocol specifications have real value for users only when ratified as official standards which everybody can reuse without legal restriction or paying any fees. When they choose to, governments can mandate standards of this kind as compatibility requirements in public requests for proposals, and can have confidence that such standards provide high-quality features, reliability, and real interoperability both now and in the future.
The obsolescent media problem is hardware-related. Digital storage media are much more fragile than nondigital ones: parchment lasts millennia when handled well, hard drives just a few years. Furthermore, digital media go out of date as new and better ones are invented—for instance, lots of people stored documents on floppy disks in the 1980s and 1990s, but hardly any computer systems can be found now with floppy disk drives.
The second problem is much more serious. Even when the container works perfectly, bit sequences are absolutely useless if you don’t know what they mean, and if the instructions you need to read or translate them are lost or too expensive to buy.
These aren’t hypotheses. Almost all the files created by public and private businesses around the world are already encoded in a way that only one suite of programs, from one single, for-profit company, can read without compatibility problems. What if that company went belly up? Think it’s too big to fail? Isn’t this what everybody would have said in 2008 about Lehman Brothers, General Motors, or Chrysler?
According to Jerome P. McDonough, assistant professor in the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign, the total amount of data in the files of all types, from “government records to tax files, email, music, and photos” that could be lost due to “ever-shifting platforms and file formats” is about 369 billions of billions of bytes. (As a reference, the size of this chapter is less than 30,000 bytes.)
The Nimitz diagrams are locked inside files whose format, being unknown, can’t be decoded with modern software. This is not an isolated example; all over the world, billions of designs, from furniture to water purification systems, bridges and buildings, plane and car parts, are stored in a format that only the few developers of one program ever knew how to read without errors.
We can’t go back to the predigital era. It would be stupid to do so. But if we don’t start managing digital data and communications the right way—with a view toward both real interoperability and future readability—both private and public life will become harder to manage.
Luckily, many governments are aware of these hardware and software problems, but the only recourse they’ve found is precisely the one I just derided: sticking to nondigital media. Most national archives, and many other public and private organizations around the world, still waste a lot of money and resources because they don’t feel safe depending only on digital documents for long-term storage. For example, the Virginia State library “cannot accept records for permanent storage on digital media at this time due to the lack of hardware and software standards.” Consequently, “Electronic records identified as permanent…must be converted to archival quality microfilm or alkaline paper before being transferred to the Library” (http://www.archiveindex.com/laws/law-va.htm). What if, 20, 30, or 40 years from now, the digital records of your pension payments were unreadable? What is the benefit of digital documents for a small business, if it must continuously update software and hardware without any need except to maintain archives, or continue to (re)enter data by hand in incompatible systems?
The health care system experiences the worst of the situation, suffering from both high costs and subpar care. For example, Rep. Tauzin explicitly complained that none of the hospitals he visited were able to share digital records with one another.
Many governments worldwide are fighting the same battle, on a much bigger scale. In the United States, the George W. Bush administration left behind 100 trillion bytes of electronic records. That’s 50 times as much as President Clinton left in 2001, but surely much less than what the Obama administration will produce. Already, the Bush archives, which include historical documents such as top-secret email tracing plans for the Iraq war, contain data in “formats not previously dealt with” by the U.S. National Archives.
So, to come full circle, if Obama’s pen was like digital media, anyone wanting to read the memoranda would have to buy the same kind of pen. Not much openness or freedom of information in that!
There are several reasons for this mess, but a particularly important one is our ignorance as a society. Software is still so new in our culture that most of us (including many people who consider themselves “experts” because they spend lots of time using office suites, computer games, or social networks) haven’t actually realized yet the roles played by formats and protocols, and how they can run against our interests.
Consider how people refer to office files. Nobody would talk about a handwritten letter by mentioning the name of the pen used to write it; saying “I sent you a Bic letter” or a “Mont-Blanc letter” would be a sure way to have everyone laugh at you. Yet most people regularly say “I’ll send you a PowerPoint” or “I need to check the figures in that Excel file,” which is the same thing, but with no embarrassment.
Such phrases would trigger concern if the public knew why and how software is different from Obama’s pen. Not only do people use the software without regard for compatibility and future access, but worse still, they make schoolchildren addicts to that software because “everybody else does it,” or to learn what advertising says to be the best the industry offers, or to “have more opportunities.” This is at least counterproductive, if not actually dangerous.
In every generation, automobile companies go out of business. It becomes difficult to buy spare parts for existing cars, but at least the disappearance of the product line has no effect on your ability to buy and drive cars in the future. You don’t lose all your memories of trips made with the old car, or have more trouble dealing with the businesses you drove to. And your new car need not be “compatible” with any other, old or new.
But when a software company goes out of business, or simply discontinues a product, all the documents you created with it could go out of your reach for good. All it takes is a switch to a new computer. (Modern proprietary software licenses make it hard to run an old program on a new computer even if they’re technically compatible.) The software is similar to a nuclear plant without any waste management policy, or to depleted uranium weapons: they hurt people who weren’t there when they were used, for a long time afterward. A company or government agency that uses software in nonstandard formats constrains without any real reason not only its own choices, but those of everybody who interacts with it for all of history.
Software developers have two ways to make their users come back for a new version of their programs. One is to keep writing software that’s actually better than the previous version: faster, easier to use, more flexible, and with support for new contexts such as the Web. The other way is not to struggle for improved quality, but to create secret file formats or protocols and change them without a really valid reason every year. People who stick to the old versions of the software find they can’t do business with people who bought the new version, so everyone is forced to upgrade.
Once a movie, contract, or business report has been saved in a format that can be read by only one software program, you can forget copyright. That document now belongs to the developer or company that developed that program. If you still want it, you must accept their conditions. That’s how Word/Excel/PowerPoint and AutoCAD became de facto monopolists in their respective markets: their file formats, not the software itself, are secret. People who were already using those programs could not get rid of them without losing the files they had already created and distributed to other people, who, in turn, were forced to buy the same programs to open them, and so on. Had the file formats been really usable with other programs, no one would have cared about those programs being secret.
Which is why I declared at the beginning of this chapter that software is less sophisticated than pens, because pens create none of these problems. What they produce is 100% guaranteed compatible with all other pens and sheets of paper in the world. You don’t need to own the same pens as Obama to read what he writes, or to write a letter to him. There are two foundations for this openness: first, pens are tools that are completely independent from the document format, which in their case is the alphabet, that is the shape and meaning of the characters in which languages are written. Second, alphabets are not secret and no one needs permission to use them. Software should work in the same open way.
Standards are meant to ensure that data can be accessed in a variety of ways so that no single program or software vendor is indispensable. There’s an art and a science to writing standards, of course. If they’re ambiguous, incomplete, or poorly written, they won’t do their job. That’s why standards committees sign up a wide variety of experts to write standards, and it takes years to do.
Formats and protocols are often more important than software, because most programs are worthless without other programs to talk to (imagine if you were the only person in the world with an email program) or without data to process (like if you had a word processor that couldn’t open or save a file). We run software to manage data, not the other way around. The only way to guarantee that our data remains ours, and always immediately available, is to store it in file formats which are really independent from any single software product.
Democracy implies accountability, efficiency, optimal usage of public money, and transparency in all public operations and services, regardless of whether they are managed by the private or public sector: in a word, openness. Software and digital data can help tremendously to achieve these and other crucial goals.
For example, according to “Standards and the Smart Grid: The U.S. Experience,” “increased use of digital information” is one of the essential prerequisites for building the smart energy grid that will help to decrease U.S. dependence on foreign energy and fuel job creation. Getting hundreds of companies around the continent to share this information requires open, standard formats.
There are huge efforts these days to digitize individual medical histories, drug records, test results, and surgeries all in one big file for each individual, called an electronic health record (EHR). Personal EHRs could help to greatly reduce paperwork, treatment costs, and time spent in hospitals and labs, and will facilitate people moving from one city, health insurance company, or service provider to another. In contrast to the ordeals of Quaid’s children and Rep. Tauzin, doctors could always make the best decisions for your health in the safest, fastest, and cheapest way possible. As long as their computers can read your EHR, of course.
Publishing online without legal restrictions raw data such as maps, census records, weather surveys, agricultural statistics, court rulings, and agency budgets (while protecting citizens’ privacy, of course) makes two wonderful things possible. One is the generation of new wealth: if both public agencies and private businesses can freely use all that data to make better decisions and offer new services, they’ll minimize their expenses and make more money. This will both stimulate the economy and increase the tax base. The other advantage of correctly publishing public raw data online is much more control by private citizens over their governments, as well as closer cooperation with them.
Having such data online makes it possible for civic-minded programmers to finally build and use “follow the money” search engines. Everybody could use or develop interfaces such as Google Squared to display, all in one table, things such as who got money from a public contract, who approved it, all the present and past relationships among those people (such as sitting on the boards of the same companies), the percentage of contracts assigned to some firm from each public officer, and so on. It would be much easier for everybody to visualize how numbers, decisions, and physical places are related. You could generate on-the-spot maps that show how tax money moves from one county to another and why, and how it varies over time with the party in power. Residents of each town could see without intermediaries how demographics and pollution sources in any given area increase the occurrence of some specific illness. It would also become much easier to contribute data into these systems, which makes them more useful to public administrators.
Demanding that all public administrations and schools, at all levels, accept and store office files only in nonproprietary standard formats such as OpenDocument (the only viable alternative today to the forced upgrades caused by the continuous changes in .doc, .xls, and .ppt file formats) would leave all their partners free to use whatever office software they like best. At the same time, it would protect the pockets and freedom of choice of millions of small businesses, schools, and students who can’t afford the licensing costs of “industry-standard” word processors.
In principle, the current U.S. administration is in favor of going digital this way. The Obama stimulus package signed in February 2009 provided $19 billion to bring hospitals the benefits of digital technology. The “Transparency and Open Government” memo includes statements such as the following:
Government should be transparent. Transparency promotes accountability and provides information for citizens about what their Government is doing. Information maintained by the Federal Government is a national asset. My Administration will take appropriate action, consistent with law and policy, to disclose information rapidly in forms that the public can readily find and use. Government should be collaborative. Collaboration actively engages Americans in the work of their Government. Executive departments and agencies should use innovative tools, methods, and systems to cooperate among themselves, across all levels of Government, and with nonprofit organizations, businesses, and individuals in the private sector.
The truth, however, is that these and many other things, including FOIA, will be technically possible only if by mandating the use of open, standard formats.
To enshrine open standards in government and make sure they are robustly implemented, governments should lean whenever possible toward free/libre/open source software (FLOSS). As described in Chapter 32, FLOSS code is available to everybody without any royalty or legal restriction. Everybody can install as many copies of the program as they wish, or create and redistribute, under the same conditions, custom copies of that program starting from the source code.
Still, FLOSS is not enough to guarantee that owners of documents will always be able to read them, because the original source code might be lost or fail to work on newer computer systems. In such cases, files become unreadable not because of software licenses, but simply because their authors never bothered to demand that the programmers use fully documented file formats. That’s why it’s important to stick to really open standards that exist and are defined regardless of any specific software program, regardless of its license.
So, FLOSS is an important step toward open government, but truly open formats and protocols are often even more important, because most programs are worthless without other programs with which to talk, or without data to process. We run software to manage data, not the other way around. Open formats and protocols are standards whose complete specification is published in enough detail that any programmer can, without royalties or other conditions, write new software fully compatible with that format or protocol. Such standards don’t rely on any proprietary subcomponents, are developed through consensus and experimentation, and are maintained by a recognized international, nonprofit community. Only standards such as these give real guarantees that our data remains ours and that its formats will remain readable, while no one can exploit them to lock in users and exclude competition.
So-called de facto or industry standards often aren’t open. They often belong to one (usually for-profit) company. Even when they are entirely published, their owners can change them at will, whenever they feel like it, and without informing everybody else of which changes were made. In other cases, you need explicit permissions to use the standard. Such standards may even be created just to stifle competition: a company may create a specification that describes the file format incompletely and with proprietary features that only it can provide, and then lobby to have it recognized as a standard. This is a mock standard, because no one else can develop software that really works with the format.
This is relevant because conformance to some standard is often (rightly!) a mandatory requirement in contracts for information and computer technology products and services paid with your tax money.
There is another reason why relying only on adoption of FLOSS to keep everything open, instead of starting from truly open standards as defined earlier, is not the optimal solution. File formats should be as few as possible and as stable as possible. FLOSS makes it always possible to convert data from one format to another, but why create the need for conversion if it isn’t absolutely necessary? Think of software as pens, and formats as alphabets. We went from quills to email in just a few centuries exactly because the alphabets remained practically unchanged, allowing each generation to learn and build on what already existed rather than rewrite every manuscript in a different way every few years. Innovation whose impact is limited to internal software features is less of a problem, as it leaves documents readable by everybody. An insistence on open formats and protocols (which can be used also by proprietary software) will actually stimulate developers to improve the user experience and other aspects of their software (aspects that are independent of the standards) rather than try to dominate the market through their control over formats.
In other words, insistence on open standards for file formats and protocols will also make it much easier to evaluate software programs according to their actual merits: performance, flexibility, ease of use and customization, documentation quality, and so forth. Think again to pens and alphabets. There is nothing wrong in selling luxury pens made with secret or patented technology, as long as cheap pens can also exist. But the whole thing is contingent on everybody using the same alphabet, without needing to pay fees or learn special secrets.
Digitization is good, but only when it’s open in the ways described in these pages. Governments must lead the way in this goal, both by example and by enforcing interoperability through really open digital standards, for several reasons:
Without exploiting all the potential of open standards and FLOSS, there can be no open government, no FOIA, no smart energy grids, and no efficient services. Open data and file formats are mandatory to guarantee that all citizens can analyze raw public data or submit their own information, or that data can be retrieved 20, 50, or 100 years later. The first, nonnegotiable step toward any open government policy is therefore to demand that only really open formats and protocols be used for public data and digital interaction with any public administration.
In the modern world, technology (especially digital technology) is legislation. A government that insists on using, or tolerating, closed, secret formats and protocols has abdicated part of its duty to protect individual freedom and equal opportunities, both in business and in education, as well as the hope of reducing costs.
To emphasize the preceding point, open formats save money. Only if there is no vendor lock-in can public agencies, businesses, and individuals get really competitive offers from many providers.
Open formats and protocols are both an extremely profitable investment and an enabler. Compared to reforming pension systems, health care, transportation, energy, pollution, or public education, open formats, protocols, and FLOSS are much quicker and cheaper to adopt. Therefore, since software is so ubiquitous, the adoption of open formats has a positive impact on all those other fields. There is probably no other way to save so much money in so many different places and free vital resources with so (comparatively) little effort than through these technologies—as long as that effort is coordinated, of course. That’s why it must be governments that set the example and constitute the critical mass that makes open standards and FLOSS accessible to everybody.
Marco Fioretti is a freelance writer, an activist, a popularizer, a teacher, and a speaker about open digital standards, Free Software, and digital technologies and their relation to and impact on education, ethics, civil rights, and environmental issues. Marco is the webmaster of Stop/Zona-M, a website designed to help all normal people stop and learn the essentials of, and think about, the things that matter to them. Marco is the author of Family Guide to Digital Freedom and a regular contributor to several print and online ICT magazines. His website is http://mfioretti.com.