Chapter 1. Introduction
Have you ever wanted to publish information on the Web so programs beyond browsers could work with it? Have you ever needed to make two or more computers running different operating systems and programs written in different languages share processing? Have you ever wanted to build distributed applications using tools that let you watch the information moving between computers rather than relying on “package and pray?”
Web services are a set of tools that let you build distributed applications on top of existing web infrastructures. These applications use the Web as a kind of “transport layer” but don’t offer a direct human interface via the browser. Reusing web infrastructures can drastically lower the cost of setting up these applications and allows you to reuse all kinds of tools originally built for the Web.
XML-RPC is among the simplest (and most foolproof) web service approaches, and makes it easy for computers to call procedures on other computers. XML-RPC reuses infrastructure that was originally created for communications between humans to support communications between programs on computers. Extensible Markup Language (XML) provides a vocabulary for describing Remote Procedure Calls (RPC), which are then transmitted between computers using the HyperText Transfer Protocol (HTTP).
XML-RPC can simplify development tremendously and make it far easier for different types of computers to communicate. By focusing on computer-to-computer communications, XML-RPC lets you use web technologies without getting trapped in the focus on human-readable content that has characterized most previous web development. Most of the XML-RPC framework will be familiar to web developers, but as a web developer, you will probably use off-the-shelf packages to connect your programs.
The rest of this book explores this simple, but powerful, approach more thoroughly using various development techniques. Chapter 3 through Chapter 7 explore the XML-RPC libraries available for Java, Perl, PHP, Python, and Active Server Pages, and Chapter 8 takes a look at XML-RPC’s future. But before we can dive into the details of the XML-RPC protocol in Chapter 2, we need to lay some basic groundwork. The rest of this chapter covers what XML-RPC does, where it excels, and when you may not want to use it.
What XML-RPC Does
At the most basic level, XML-RPC lets you make function calls across networks. XML-RPC isn’t doing anything especially new, and that largely explains why XML-RPC is useful. By combining an RPC architecture with XML and HTTP technology, XML-RPC makes it easy to for computers to share resources over a network. This means that you can give users direct access to the information they need to process, not just read and reuse systems you’ve already built in new contexts, or mix and match programs so that each can focus on what it does best.
Remote Procedure Calls (RPC)
Remote Procedure Calls (RPC) are a much older technology than the Web. Although the concept of computers calling functions on other systems across a network has been around as long as networks have existed, Sun Microsystems is usually given credit for creating a generic formal mechanism used to call procedures and return results over a network. RPC fit very well with the procedural approach that dominated programming until the 1990s.
Say you have a procedure that calculates momentum. This function knows the speed and name of the object, but it needs to know the object’s mass to calculate the momentum. It needs to call a procedure that returns the mass for a given object. For a local procedure call, this is fairly straightforward. Programming languages let you divide your programs into procedures (or functions or methods) that call one another. The syntax is different, but generally, you can pass parameters and get a result:
Now imagine that
implemented on a remote system. In this case, calling the procedure
requires your program to know a lot more about a more complex process.
Your program needs to know which remote system to contact, how to
package and send the parameters, how to receive an answer, and how to
unpackage and present that answer to the routine that called it
Although the RPC approach involves considerable extra overhead, with libraries on both sides of the connection creating and processing messages, as well as the possibility of delays in crossing the network, the approach does permit distributed processing and sharing of information.
The RPC approach makes life easy for you as a programmer because it spares you the trouble of having to learn about underlying protocols, networking, and various implementation details. RPC libraries are generally designed to be relatively transparent and are often operated with a single function call rather than a complex API. The abstraction required to implement RPC has another advantage for developers; because there has to be a defined protocol operating underneath the RPC system, it’s possible to create alternate implementations of that protocol that support different environments. Programs written on mainframes, minicomputers, workstations, and personal computers, even from different vendors, could communicate if they had a network in common.
Effectively, RPC gives developers a mechanism for defining interfaces that can be called over a network. These interfaces can be as simple as a single function call or as complex as a large API. RPC is an enabling mechanism, and as a developer you can take as much advantage of it as you like, limited only by network overhead costs and architectural concerns.
Letting Computers Talk: XML and the Web
Although half of XML-RPC’s heritage comes from RPC, the other half comes from the World Wide Web. The Web’s growth over the last decade has been explosive, moving rapidly from techie curiosity to ubiquitous consumer tool. The Web provides an interface that is easy for developers to build but still simple enough for ordinary humans to negotiate. Although the Web was initially a tool for human-to-human communications, it has evolved into a sophisticated interface for human-to-computer interaction, and is also moving into increasingly complex computer-to-computer communications.
As fantastically successful as HTML was, it was only really useful for transactions
presenting information to people. As HTML’s limitations became
clearer, the World Wide Web Consortium (W3C), keeper of the HTML
specification, hosted the development of Extensible Markup Language
(XML), a markup language that fits into the same environment as HTML
but provides far more flexibility for communications between programs.
XML allows developers to create documents whose contents are described
far more precisely than is possible in HTML. XML makes it possible to
create messages intended for computer interpretation, not just
presentation to readers. XML lets you create a set of tags for your
data, such as
<author> for book catalog
information. XML-RPC uses its own set of tags to mark up procedure
calls. Because XML was built to fit into the same framework that
carries HTML, it has created new possibilities for the Web, including
Reusing Web Protocols and Infrastructure
XML-RPC reuses another key component of the Web, its transport protocol. The HTTP protocol was built into an enormous number of development environments, from web servers proper to micro-servers intended for use directly inside of programs. Developers are used to the process of assembling documents for transport over HTTP, and network administrators have supported web servers and web-friendly firewalls for years.
In many ways, HTTP is an RPC-based protocol, opening with an identifier for the method being called and then providing parameters that determine what that method should return. HTTP’s relatively open approach, based on the MIME (Multipurpose Internet Mail Extensions) set of standards for identifying and encoding different kinds of content, has given it enough flexibility to carry the many kinds of content needed for web sites. That flexibility provides it with enough strength to carry the kinds of payloads an RPC protocol demands.
Building a Different Kind of Web
XML-RPC allows you to implement the RPC approach described previously while taking advantage of existing HTTP tools and infrastructures. Because HTTP is available on all kinds of programming environments and operating systems, and because XML parsers are similar commodity parts, it’s relatively easy to assemble an XML-RPC toolkit for any given environment.
Most web applications are designed to present information to people. With XML-RPC and web services, however, the Web becomes a collection of procedural connections where computers exchange information along tightly bound paths. Instead of having humans surf through hypertext links, computers follow previously arranged rules for exchanging information. This exchange doesn’t have to follow the client-server model established by the Web. XML-RPC supports peer-to-peer communications as well as client-server approaches, taking advantage of HTTP’s facilities for sending information from the browser to the server more often than most web browsers do.
XML-RPC clients make procedure requests of XML-RPC servers, which return results to the XML-RPC clients. XML-RPC clients use the same HTTP facilities as web browser clients, and XML-RPC servers use the same HTTP facilities as web servers. Those roles aren’t nearly as fixed as they are in the regular web world, however. It’s common for the same program to include both XML-RPC client and server code and to use both when appropriate.
Although you can build XML-RPC handlers using traditional web techniques, there is little need to drill that deep. As a developer, you may never even need to see XML-RPC’s internals or know that the RPC system you use is running over the Web. Most XML-RPC implementations hide the details of XML-RPC from those using it, requesting only a port number to communicate over. You may need a web site administrator to set up the initial system, or you may need to integrate your XML-RPC servers with web server features like secure transactions, but once that initial setup is complete, XML-RPC is much like any other RPC system.
Where XML-RPC Excels
XML-RPC is an excellent tool for establishing a wide variety of connections between computers. If you need to integrate multiple computing environments, but don’t need to share complex data structures directly, you will find that XML-RPC lets you establish communications quickly and easily. Even if you workwithin a single environment, you may find that the RPC approach makes it easy to connect programs that have different data models or processing expectations and that it can provide easy access to reusable logic.
XML-RPC’s most obvious field of application is connecting different kinds of environments, allowing Java to talk with Perl to talk with Python to talk with ASP, and so on. Systems integrators often build custom connections between different systems, creating their own formats and protocols to make communications possible, but they often end up with a large number of poorly documented single-use protocols. Each piece might work very well at its appointed task, but developers have to constantly create new protocols for new tasks, and reusing previous protocols can be very difficult.
XML-RPC offers integrators an opportunity to use a standard vocabulary and approach for exchanging information. This means that developers can create open programming interfaces. Sometimes a project has clearly defined needs for connecting two or more specific environments together, and a small set of XML-RPC packages can help create a complete solution. In other cases, developers want to publish a service but don’t necessarily know what kinds of clients they have to support.
XML-RPC makes it possible for services like Meerkat (http://meerkat.oreillynet.com) to provide an interface that can be accessed from several different environments. Meerkat, which aggregates news and announcement information from hundreds of sites, can be searched through a traditional web interface or through an XML-RPC interface (documented at http://www.oreillynet.com/pub/a/rss/2000/11/14/meerkat_xmlrpc.html). Developers who want to use Meerkat information in their own applications can call functions on the Meerkat server, and Meerkat’s maintainers don’t need to know anything about the details.
Because XML-RPC is layered on top of HTTP, it inherits the inefficiencies of HTTP. This does place some limitations on its use in large-scale, high-speed applications, but inefficiency isn’t important in many places. Although there are definitely high-profile projects for which systems must scale to millions of transactions at a time, keeping response time to a minimum, there are also many projects to which systems need to send information or request processing far less often -- from once a second to once a week -- and for which response time isn’t absolutely critical. For these cases, XML-RPC can simplify developers’ lives tremendously.
A Quick Tour of the Minefields
Before moving into the details of XML-RPC and exploring its capabilities in depth, it’s worth pausing for a moment to examine some possible areas where using XML-RPC may not be appropriate. Although RPC and tunneling over HTTP are both useful technologies, both techniques can get you into trouble if you use them inappropriately. Neither technique is exactly the height of computing elegance, and there are substantial scalability and security issues that you should address at the beginning of your projects rather than at the end.
RPC architectures have some natural limitations. There are plenty of cases when RPC is still appropriate, including some when combining logic with data in objects is either risky or excessively complex, and messaging might require additional unnecessary overhead. On the other hand, RPC lacks the flexibility made possible by the other approaches because of the relative simplicity of its architecture. The level of abstraction in RPC is relatively low, leading to potential complexity as the number of different requests increases.
Although the descriptions in the previous section might suggest that RPC is just a message-passing mechanism, the messages can’t be arbitrary. Remote Procedure Calls, like procedure calls in programs, take a procedure name and a set of typed parameters and return a result. Although developers can build some flexibility into the parameters and the result, the nature of procedure calls brings some significant limitations for development, flexibility, and maintenance.
Development methodologies have spent the last 50 years moving toward looser and looser connections between computing components -- on both the hardware and software sides. Looser connections mean more flexibility for consumers of computing products and their developers. XML-RPC provides some flexibility, abstracting away differences between computing environments, but the procedures to which it is applied supply only limited flexibility. Careful API design can help developers create maintainable systems, but changing APIs is often more difficult than adding additional information to a message. If different systems need to see their information in different forms, API complexity can grow rapidly.
Protocol Reuse Issues
Although XML-RPC reaps enormous benefits by using HTTP as its foundation, many developers see such reuse as misguided, wrong, or even dangerous. In some sense XML-RPC’s genius lies in its perversity, its creative reuse of a standard that was designed for relatively simple document transfers. Although XML-RPC’s reuse of the software infrastructure makes sense, there are definitely those who feel that XML-RPC conflicts with the infrastructure that supports the protocol.
Although reuse issues come up on a regular basis on nearly every mailing list that touches on XML-RPC or SOAP, the most detailed discussion of reuse issues is Keith Moore’s Internet-Draft titled “On the use of HTTP as a Substrate for Other Protocols” ( http://www.ietf.org/internet-drafts/draft-moore-using-http-01.txt).
HTTP isn’t very efficient
HTTP has some limitations for building distributed computing environments. It was originally created to ship HTML from servers to browsers, although later versions added support for a wide variety of file formats and for limited communications (through forms from the web browser to the web server). HTTP grew in fits and starts from a very small base, and some approaches it uses reflect compatibility needs rather than best practices. Although HTTP is easy to use, it’s not really designed for performance.
XML-RPC isn’t your average web page
An XML-RPC processor will probably be referenced using a
URL such as
http://www.example.com/RPC/. That URL looks
awfully familiar -- it might, in fact describe an HTML page that
just happens to be retrievable from
http://www.example.com/RPC/, rather than an
XML-RPC processor. There might even be a form processor lurking
there, waiting for
There is no way to tell from the bare URL that it references
something outside the realm of ordinary web browsing
HTTP already supports significant diversity for URL behavior
by allowing the
POST, and other methods, each of which may
return different information. XML-RPC takes this diversity to a new
level, however, by moving outside of the normal format in which
POSTed information is sent and by
creating a new set of structures for defining behavior. The same URL
might have hundreds, or even thousands, of different methods
available to service XML-RPC requests; a big change from the “one
form, one program” common to most
POST processing, and potentially larger in
scale than even the most ambitious generic form processors.
XML-RPC also provides no default behavior for users hitting an
XML-RPC processor with a
request. Sending an HTTP error message is one possibility, breaking
the connection is another, and sending a polite web page explaining
that this URL is for XML-RPC use only is another. Developers might
even choose to hide an XML-RPC processor underneath a regular HTTP
URL, responding to
with web pages and to XML-RPC requests with XML-RPC responses.
(Don’t consider this security, however!)
Breaking through firewalls by reusing HTTP
Part of XML-RPC’s promise is its subversion of network security rules (making it possible for developers to bypass firewalls), but that is also a critical part of XML-RPC’s danger and raises vehement opposition. Although there have been plenty of security warnings about web browsers over the years, the need for people on various private networks to read the Web has given HTTP and port 80 a greater degree of freedom than most other protocols. Network administrators rely on filters, proxies, or a simple pass-through to avoid the raft of user complaints that emerge when web access is denied.
XML-RPC takes advantage of this common practice -- and states that it does so, right in the specification -- to let it establish tight bonds between systems that are on supposedly separate networks. XML-RPC already provides very little security for its transactions, and its firewall-breaching capabilities raise serious new security threats for network administrators who thought they had plugged all the holes. Adding an XML-RPC interface to the computer that holds a company’s financial information may not be so smart if that computer can be reached from outside networks. Because HTTP is effectively privileged, the odds of that computer’s XML-RPC interface being exposed are much higher than the odds of an interface built on a protocol where security is traditionally of greater concern.
To some extent, these issues aren’t too hard for network administrators to address. Many firewall and NAT setups already block incoming requests, only permitting responses to requests that originated on the internal network. Although this block would allow outgoing information flows, it would prevent the outside world from making requests of the systems on the private network. In other cases, typically those in which port 80 is considered an “open” port, network administrators may have a lot of additional work to do in figuring out how best (and if) to allow XML-RPC transactions and how to block them, if desired.
Because of these “wolf in sheep’s clothing” issues, some developers prefer to see XML-RPC and similar protocols take a different approach. Some developers find HTTP to be too insecure, too inefficient, and generally too inappropriate as a base for these application protocols, but few call for an outright ban.
Keith Moore’s “On the use of HTTP as a Substrate for Other Protocols” ( http://www.ietf.org/internet-drafts/draft-moore-using-http-01.txt) outlines a list of practices he considers appropriate to proper use of HTTP, nearly all of which XML-RPC violates. XML-RPC provides no explicit security model, “masquerades” as existing HTTP traffic, uses the “http” URL scheme and port 80, doesn’t define explicitly how the client and server interact with proxies, and allows the use of HTTP errors for certain situations. As we’ll see in the next chapter, XML-RPC also provides its own mechanism for reporting procedure call faults.
We’ll consider these issues again in Chapter 8, after we’ve explored XML-RPC more deeply. That chapter also covers some alternatives to XML-RPC that have emerged, such as the Simple Object Access Protocol (SOAP); Universal Description, Discovery, and Integration (UDDI); and Web Services Description Language (WSDL). For now, these warnings are worth keeping in mind, especially if you have to explain how and why you’re using XML-RPC to an unsympathetic network administrator. In simple situations, especially when you control both the network and all systems on it, these issues probably won’t cause you any harm.