Published on O'Reilly (http://oreilly.com/)
 See this if you're having trouble printing code examples

Inventing the Future


O'Reilly Emerging Technologies Conference

The 2002 O'Reilly Emerging Technologies Conference explored how P2P and Web services are coming together in a new Internet operating system.

"The future is here. It's just not evenly distributed yet." I recently came across that quote from science-fiction writer William Gibson, and I've been repeating it ever since.

So often, signs of the future are all around us, but it isn't until much later that most of the world realizes their significance. Meanwhile, the innovators who are busy inventing that future live in a world of their own. They see and act on premises not yet apparent to others. In the computer industry, these are the folks I affectionately call "the alpha geeks," the hackers who have such mastery of their tools that they "roll their own" when existing products don't give them what they need.

The alpha geeks are often a few years ahead of their time. They see the potential in existing technology, and push the envelope to get a little (or a lot) more out of it than its original creators intended. They are comfortable with new tools, and good at combining them to get unexpected results.

What we do at O'Reilly is watch these folks, learn from them, and try to spread the word by writing down (or helping them write down) what they've learned and then publishing it in books or online. We also organize conferences and hackathons at which they can meet face to face, and do advocacy to get wider notice for the most important and most overlooked ideas.

So, what am I seeing today that I think the world will be writing about (and the venture capitalists and entrepreneurs chasing) in the years to come?

All of these things come together into what I'm calling "the emergent Internet operating system." The facilities being pioneered by thousands of individual hackers and entrepreneurs will, without question, be integrated into a standardized platform that enables a next generation of applications. (This is the theme of our Emerging Technologies conference in Santa Clara May 13-16, "Building the Internet Operating System.") The question is, who will own that platform?

Both Microsoft and Sun (not to mention other companies like IBM and BEA) have made it clear that they consider network computing the next great competitive battleground. Microsoft's .Net and Sun's Java (from J2ME to J2EE) represent ambitious, massively engineered frameworks for network computing. These network operating systems--and yes, at bottom, that's what they are--are designed to make it easier for mainstream developers to use functions that the pioneers had to build from scratch.

But the most interesting part of the story is still untold, in the work of hundreds or thousands of independent projects that, like a progressively rendered image, will suddenly snap into focus. That's why I like to use the word "emergent." There's a story here that is emerging with increasing clarity.

What's more, I don't believe that the story will emerge whole-cloth from any large vendor. The large vendors are struggling with how to make money from this next generation of computing, and so they are moving forward slowly. But network computing is a classic case of what Clayton Christensen, author of The Innovator's Dilemma, calls a disruptive technology. It doesn't fit easily into existing business models or containers. It will belong to the upstarts, who don't have anything to lose, and the risk-takers among the big companies, who are willing to bet more heavily on the future than they do on the past.

Let's take Web services as an example. Microsoft recently announced they hadn't figured out a business model for Web services, and were slowing down their ambitious plans for building for-pay services. Meanwhile, the hackers, who don't worry too much about business models, but just try to find the shortest path to where they are going, are building "unauthorized" Web services by the thousands.

Spiders (programs which download pages automatically for purposes ranging from general search engines to specialized shopping comparison services to market research) are really a first-generation Web service, built from the outside in.

Spiders have been around since the early days of the Web, but what's getting interesting is that as the data resources out on the Net get richer, programmers are building more specialized spiders--and here's the really cool bit--sites built with spiders themselves are getting spidered, and spiders are increasingly combining data from one site with data from another.

One developer I know built a carpool planning tool that recommended ridesharing companions by taking data from the company's employee database, then spidering MapQuest to find people who live on the same route.

There are now dozens of Amazon rank spiders that will help authors keep track of their book's Amazon rank. We have a very powerful one at O'Reilly that provides many insights valuable to our business that are not available in the standard Amazon interface. It allows us to summarize and study things like pricing by publisher and topic, rank trends by publisher and topic over a two-year period, correlation between pricing and popularity, relative market share of publishers in each technology area, and so on. We combine this data with other data gleaned from Google link counts on technology sites, traffic trends on newsgroups, and other Internet data, to provide insights into tech trends that far outstrip what's available from traditional market research firms.

There are numerous services that keep track of eBay auctions for interested bidders. Hackers interested in the stock market have built their own tools for tracking pricing trends and program trading. The list goes on and on, an underground data economy in which Web sites are extended by outsiders to provide services that their owners didn't conceive.

Right now, these services are mostly built with brute force, using a technique referred to as "screen scraping." A program masquerading as a Web browser downloads a Web page, uses pattern matching to find the data it wants, keeps that, and throws the rest away. And because it's a program, not a person, the operation is repeated, perhaps thousands of times a minute, until all the desired data is in hand.

For example, every three hours, amaBooks, our publishing market research spider, downloads information about thousands of computer books from Amazon. The Amazon Web page for a book like Programming Perl is about 68,000 bytes by the time you include description, reader comments, etc. The first time we discover a new book, we want under a thousand bytes--its title, author, publisher, page count, publication date, price, rank, number of reader reviews, and average value of reader reviews. For later visits we need even less information: the latest rank, the latest number of reviews, and any change to pricing. For a typical run of our spider, we're downloading 24,000,000 bytes of data when we need under 10,000.

Eventually, these inefficient, brute-force spiders, built that way because that's the only way possible, will give way to true Web services. The difference is that a site like Amazon or Google or MapQuest or E*Trade or eBay will not be the unwitting recipient of programmed data extraction, but a willing partner. These sites will offer XML-based APIs that allow remote programmers to request only the data they need, and to re-use it in creative new ways.

Why would a company that has a large and valuable data store open it up in this way?

My answer is a simple one: because if they don't ride the horse in the direction it's going, it will run away from them. The companies that "grasp the nettle firmly" (as my English mother likes to say) will reap the benefits of greater control over their future than those who simply wait for events to overtake them.

There are a number of ways for a company to get benefits out of providing data to remote programmers:

Even though I believe that revenue is possible from turning Web spiders into Web services, I also believe that it's essential that we don't make this purely a business transaction. One of the beauties of the Internet is that it has an architecture that promotes unintended consequences. You don't have to get someone else's permission to build a new service. No business negotiation. Just do it. And if people like what you've done, they can find it and build on it.

As a result, I believe strongly that Web services APIs need to have, at minimum, a low-volume option that remains free of charge. It could be done in the same way that a company like Amazon now builds its affiliates network. A developer signs up online using a self-service Web interface for a unique ID that it must present for XML-based data access. At low volumes (say 1,000 requests a day), the service is free. This promotes experimentation and innovation. But at higher volumes, which would suggest a service with commercial possibility, pricing needs to be negotiated.

Bit by bit, we'll watch the transformation of the Web services wilderness. The first stage, the pioneer stage, is marked by screen scraping and "unauthorized" special purpose interfaces to database-backed Web sites. In the second stage, the Web sites themselves will offer more efficient, XML-based APIs. (This is starting to happen now.) In the third stage, the hodgepodge of individual services will be integrated into a true operating system layer, in which a single vendor (or a few competing vendors) will provide a comprehensive set of APIs that turns the Internet into a huge collection of program-callable components, and integrates those components into applications that are used every day by non-technical people.

Return to the O'Reilly Network.

Copyright © 2009 O'Reilly Media, Inc.