BUY THIS BOOK
Add to Cart

Print Book $44.95


Add to Cart

Print+PDF $58.44

Add to Cart

PDF $35.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £31.95

What is this?

Looking to Reprint or License this content?


HTTP: The Definitive Guide
HTTP: The Definitive Guide

By David Gourley, Brian Totty
With  Marjorie Sayer Sailu Reddy Anshu Aggarwal
Book Price: $44.95 USD
£31.95 GBP
PDF Price: $35.99

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Overview of HTTP
The world's web browsers, servers, and related web applications all talk to each other through HTTP, the Hypertext Transfer Protocol. HTTP is the common language of the modern global Internet.
This chapter is a concise overview of HTTP. You'll see how web applications use HTTP to communicate, and you'll get a rough idea of how HTTP does its job. In particular, we talk about:
  • How web clients and servers communicate
  • Where resources (web content) come from
  • How web transactions work
  • The format of the messages used for HTTP communication
  • The underlying TCP network transport
  • The different variations of the HTTP protocol
  • Some of the many HTTP architectural components installed around the Internet
We've got a lot of ground to cover, so let's get started on our tour of HTTP.
Billions of JPEG images, HTML pages, text files, MPEG movies, WAV audio files, Java applets, and more cruise through the Internet each and every day. HTTP moves the bulk of this information quickly, conveniently, and reliably from web servers all around the world to web browsers on people's desktops.
Because HTTP uses reliable data-transmission protocols, it guarantees that your data will not be damaged or scrambled in transit, even when it comes from the other side of the globe. This is good for you as a user, because you can access information without worrying about its integrity. Reliable transmission is also good for you as an Internet application developer, because you don't have to worry about HTTP communications being destroyed, duplicated, or distorted in transit. You can focus on programming the distinguishing details of your application, without worrying about the flaws and foibles of the Internet.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
HTTP: The Internet's Multimedia Courier
Billions of JPEG images, HTML pages, text files, MPEG movies, WAV audio files, Java applets, and more cruise through the Internet each and every day. HTTP moves the bulk of this information quickly, conveniently, and reliably from web servers all around the world to web browsers on people's desktops.
Because HTTP uses reliable data-transmission protocols, it guarantees that your data will not be damaged or scrambled in transit, even when it comes from the other side of the globe. This is good for you as a user, because you can access information without worrying about its integrity. Reliable transmission is also good for you as an Internet application developer, because you don't have to worry about HTTP communications being destroyed, duplicated, or distorted in transit. You can focus on programming the distinguishing details of your application, without worrying about the flaws and foibles of the Internet.
Let's look more closely at how HTTP transports the Web's traffic.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Web Clients and Servers
Web content lives onweb servers. Web servers speak the HTTP protocol, so they are often called HTTP servers. These HTTP servers store the Internet's data and provide the data when it is requested by HTTP clients. The clients send HTTP requests to servers, and servers return the requested data in HTTP responses, as sketched in Figure 1-1. Together, HTTP clients and HTTP servers make up the basic components of the World Wide Web.
Figure 1-1: Web clients and servers
You probably use HTTP clients every day. The most common client is a web browser, such as Microsoft Internet Explorer or Netscape Navigator. Web browsers request HTTP objects from servers and display the objects on your screen.
When you browse to a page, such as "http://www.oreilly.com/index.html," your browser sends an HTTP request to the server www.oreilly.com (see Figure 1-1). The server tries to find the desired object (in this case, "/index.html") and, if successful, sends the object to the client in an HTTP response, along with the type of the object, the length of the object, and other information.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Resources
Web servers host web resources . A web resource is the source of web content. The simplest kind of web resource is a static file on the web server's filesystem. These files can contain anything: they might be text files, HTML files, Microsoft Word files, Adobe Acrobat files, JPEG image files, AVI movie files, or any other format you can think of.
However, resources don't have to be static files. Resources can also be software programs that generate content on demand. These dynamic content resources can generate content based on your identity, on what information you've requested, or on the time of day. They can show you a live image from a camera, or let you trade stocks, search real estate databases, or buy gifts from online stores (see Figure 1-2).
Figure 1-2: A web resource is anything that provides web content
In summary, a resource is any kind of content source. A file containing your company's sales forecast spreadsheet is a resource. A web gateway to scan your local public library's shelves is a resource. An Internet search engine is a resource.
Because the Internet hosts many thousands of different data types, HTTP carefully tags each object being transported through the Web with a data format label called a MIME type. MIME (Multipurpose Internet Mail Extensions) was originally designed to solve problems encountered in moving messages between different electronic mail systems. MIME worked so well for email that HTTP adopted it to describe and label its own multimedia content.
Web servers attach a MIME type to all HTTP object data (see Figure 1-3). When a web browser gets an object back from a server, it looks at the associated MIME type to see if it knows how to handle the object. Most browsers can handle hundreds of popular object types: displaying image files, parsing and formatting HTML files, playing audio files through the computer's speakers, or launching external plug-in software to handle special formats.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Transactions
Let's look in more detail how clients use HTTP to transact with web servers and their resources. An HTTP transaction consists of a request command (sent from client to server), and a response result (sent from the server back to the client). This communication happens with formatted blocks of data called HTTP messages , as illustrated in Figure 1-5.
Figure 1-5: HTTP transactions consist of request and response messages
HTTP supports several different request commands, called HTTP methods . Every HTTP request message has a method. The method tells the server what action to perform (fetch a web page, run a gateway program, delete a file, etc.). Table 1-2 lists five common HTTP methods.
Table 1-2: Some common HTTP methods
HTTP method
Description
GET
Send named resource from the server to the client.
PUT
Store data from client into a named server resource.
DELETE
Delete the named resource from a server.
POST
Send client data into a server gateway application.
HEAD
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Messages
Now let's take a quick look at the structure of HTTP request and response messages. We'll study HTTP messages in exquisite detail in Chapter 3.
HTTP messages are simple, line-oriented sequences of characters. Because they are plain text, not binary, they are easy for humans to read and write. Figure 1-7 shows the HTTP messages for a simple transaction.
Figure 1-7: HTTP messages have a simple, line-oriented text structure
HTTP messages sent from web clients to web servers are called request messages . Messages from servers to clients are called response messages . There are no other kinds of HTTP messages. The formats of HTTP request and response messages are very similar.
HTTP messages consist of three parts:
Start line
The first line of the message is the start line, indicating what to do for a request or what happened for a response.
Header fields
Zero or more header fields follow the start line. Each header field consists of a name and a value, separated by a colon (:) for easy parsing. The headers end with a blank line. Adding a header field is as easy as adding another line.
Body
After the blank line is an optional message body containing any kind of data. Request bodies carry data to the web server; response bodies carry data back to the client. Unlike the start lines and headers, which are textual and structured, the body can contain arbitrary binary data (e.g., images, videos, audio tracks, software applications). Of course, the body can also contain text.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Connections
Now that we've sketched what HTTP's messages look like, let's talk for a moment about how messages move from place to place, across Transmission Control Protocol (TCP) connections.
HTTP is an application layer protocol. HTTP doesn't worry about the nitty-gritty details of network communication; instead, it leaves the details of networking to TCP/IP, the popular reliable Internet transport protocol.
TCP provides:
  • Error-free data transportation
  • In-order delivery (data will always arrive in the order in which it was sent)
  • Unsegmented data stream (can dribble out data in any size at any time)
The Internet itself is based on TCP/IP, a popular layered set of packet-switched network protocols spoken by computers and network devices around the world. TCP/IP hides the peculiarities and foibles of individual networks and hardware, letting computers and networks of any type talk together reliably.
Once a TCP connection is established, messages exchanged between the client and server computers will never be lost, damaged, or received out of order.
In networking terms, the HTTP protocol is layered over TCP. HTTP uses TCP to transport its message data. Likewise, TCP is layered over IP (see Figure 1-9).
Figure 1-9: HTTP network protocol stack
Before an HTTP client can send a message to a server, it needs to establish a TCP/IP connection between the client and server using Internet protocol (IP) addresses and port numbers.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Protocol Versions
There are several versions of the HTTP protocol in use today. HTTP applications need to work hard to robustly handle different variations of the HTTP protocol. The versions in use are:
HTTP/0.9
The 1991 prototype version of HTTP is known as HTTP/0.9. This protocol contains many serious design flaws and should be used only to interoperate with legacy clients. HTTP/0.9 supports only the GET method, and it does not support MIME typing of multimedia content, HTTP headers, or version numbers. HTTP/0.9 was originally defined to fetch simple HTML objects. It was soon replaced with HTTP/1.0.
HTTP/1.0
1.0 was the first version of HTTP that was widely deployed. HTTP/1.0 added version numbers, HTTP headers, additional methods, and multimedia object handling. HTTP/1.0 made it practical to support graphically appealing web pages and interactive forms, which helped promote the wide-scale adoption of the World Wide Web. This specification was never well specified. It represented a collection of best practices in a time of rapid commercial and academic evolution of the protocol.
HTTP/1.0+
Many popular web clients and servers rapidly added features to HTTP in the mid-1990s to meet the demands of a rapidly expanding, commercially successful World Wide Web. Many of these features, including long-lasting "keep-alive" connections, virtual hosting support, and proxy connection support, were added to HTTP and became unofficial, de facto standards. This informal, extended version of HTTP is often referred to as HTTP/1.0+.
HTTP/1.1
HTTP/1.1 focused on correcting architectural flaws in the design of HTTP, specifying semantics, introducing significant performance optimizations, and removing mis-features. HTTP/1.1 also included support for the more sophisticated web applications and deployments that were under way in the late 1990s. HTTP/1.1 is the current version of HTTP.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Architectural Components of the Web
In this overview chapter, we've focused on how two web applications (web browsers and web servers) send messages back and forth to implement basic transactions. There are many other web applications that you interact with on the Internet. In this section, we'll outline several other important applications, including:
Proxies
HTTP intermediaries that sit between clients and servers
Caches
HTTP storehouses that keep copies of popular web pages close to clients
Gateways
Special web servers that connect to other applications
Tunnels
Special proxies that blindly forward HTTP communications
Agents
Semi-intelligent web clients that make automated HTTP requests
Let's start by looking at HTTP proxy servers , important building blocks for web security, application integration, and performance optimization.
As shown in Figure 1-11, a proxy sits between a client and a server, receiving all of the client's HTTP requests and relaying the requests to the server (perhaps after modifying the requests). These applications act as a proxy for the user, accessing the server on the user's behalf.
Figure 1-11: Proxies relay traffic between client and server
Proxies are often used for security, acting as trusted intermediaries through which all web traffic flows. Proxies can also filter requests and responses; for example, to detect application viruses in corporate downloads or to filter adult content away from elementary-school students. We'll talk about proxies in detail in Chapter 6.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The End of the Beginning
That's it for our quick introduction to HTTP. In this chapter, we highlighted HTTP's role as a multimedia transport protocol. We outlined how HTTP uses URIs to name multimedia resources on remote servers, we sketched how HTTP request and response messages are used to manipulate multimedia resources on remote servers, and we finished by surveying a few of the web applications that use HTTP.
The remaining chapters explain the technical machinery of the HTTP protocol, applications, and resources in much more detail.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
For More Information
Later chapters of this book will explore HTTP in much more detail, but you might find that some of the following sources contain useful background about particular topics we covered in this chapter.
HTTP Pocket Reference
Clinton Wong, O'Reilly & Associates, Inc. This little book provides a concise introduction to HTTP and a quick reference to each of the headers and status codes that compose HTTP transactions.
http://www.w3.org/Protocols/
This W3C web page contains many great links about the HTTP protocol.
http://www.ietf.org/rfc/rfc2616.txt
RFC 2616, "Hypertext Transfer Protocol—HTTP/1.1," is the official specification for HTTP/1.1, the current version of the HTTP protocol. The specification is a well-written, well-organized, detailed reference for HTTP, but it isn't ideal for readers who want to learn the underlying concepts and motivations of HTTP or the differences between theory and practice. We hope that this book fills in the underlying concepts, so you can make better use of the specification.
http://www.ietf.org/rfc/rfc1945.txt
RFC 1945, "Hypertext Transfer Protocol—HTTP/1.0," is an informational RFC that describes the modern foundation for HTTP. It details the officially sanctioned and "best-practice" behavior of web applications at the time the specification was written. It also contains some useful descriptions about behavior that is deprecated in HTTP/1.1 but still widely implemented by legacy applications.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: URLs and Resources
Think of the Internet as a giant, expanding city, full of places to see and things to do. You and the other residents and tourists of this booming community would use standard naming conventions for the city's vast attractions and services. You'd use street addresses for museums, restaurants, and people's homes. You'd use phone numbers for the fire department, the boss's secretary, and your mother, who says you don't call enough.
Everything has a standardized name, to help sort out the city's resources. Books have ISBN numbers, buses have route numbers, bank accounts have account numbers, and people have social security numbers. Tomorrow you will meet your business partners at gate 31 of the airport. Every morning you take a Red-line train and exit at Kendall Square station.
And because everyone agreed on standards for these different names, we can easily share the city's treasures with each other. You can tell the cab driver to take you to 246 McAllister Street, and he'll know what you mean (even if he takes the long way).
Uniform resource locators (URLs) are the standardized names for the Internet's resources. URLs point to pieces of electronic information, telling you where they are located and how to interact with them.
In this chapter, we'll cover:
  • URL syntax and what the various URL components mean and do
  • URL shortcuts that many web clients support, including relative URLs and expandomatic URLs
  • URL encoding and character rules
  • Common URL schemes that support a variety of Internet information systems
  • The future of URLs, including uniform resource names (URNs)—a framework to support objects that move from place to place while retaining stable names
URLs are the resource locations that your browser needs to find information. They let people and applications find, use, and share the billions of data resources on the Internet. URLs are the usual human access point to HTTP and other protocols: a person points a browser at a URL and, behind the scenes, the browser sends the appropriate protocol messages to get the resource that the person wants.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Navigating the Internet's Resources
URLs are the resource locations that your browser needs to find information. They let people and applications find, use, and share the billions of data resources on the Internet. URLs are the usual human access point to HTTP and other protocols: a person points a browser at a URL and, behind the scenes, the browser sends the appropriate protocol messages to get the resource that the person wants.
URLs actually are a subset of a more general class of resource identifier called a uniform resource identifier, or URI. URIs are a general concept comprised of two main subsets, URLs and URNs. URLs identify resources by describing where resources are located, whereas URNs (which we'll cover later in this chapter) identify resources by name, regardless of where they currently reside.
The HTTP specification uses the more general concept of URIs as its resource identifiers; in practice, however, HTTP applications deal only with the URL subset of URIs. Throughout this book, we'll sometimes refer to URIs and URLs interchangeably, but we're almost always talking about URLs.
Say you want to fetch the URL http://www.joes-hardware.com/seasonal/index-fall.html:
  • The first part of the URL (http) is the URL scheme . The scheme tells a web client how to access the resource. In this case, the URL says to use the HTTP protocol.
  • The second part of the URL (www.joes-hardware.com) is the server location. This tells the web client where the resource is hosted.
  • The third part of the URL (/seasonal/index-fall.html) is the resource path. The path tells what particular local resource on the server is being requested.
See Figure 2-1 for an illustration.
Figure 2-1: How URLs relate to browser, machine, server, and location on the server's filesystem
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
URL Syntax
URLs provide a means of locating any resource on the Internet, but these resources can be accessed by different schemes (e.g., HTTP, FTP, SMTP), and URL syntax varies from scheme to scheme.
Does this mean that each different URL scheme has a radically different syntax? In practice, no. Most URLs adhere to a general URL syntax, and there is significant overlap in the style and syntax between different URL schemes.
Most URL schemes base their URL syntax on this nine-part general format:
<scheme>://<user>:<password>@<host>:<port>/<path>;<params>?<query>#<frag>
Almost no URLs contain all these components. The three most important parts of a URL are the scheme, the host, and the path. Table 2-1 summarizes the various components.
Table 2-1: General URL components
Component
Description
Default value
scheme
Which protocol to use when accessing a server to get a resource.
None
user
The username some schemes require to access a resource.
anonymous
password
The password that may be included after the username, separated by a colon (:).
<Email address>
host
The hostname or dotted IP address of the server hosting the resource.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
URL Shortcuts
Web clients understand and use a few URL shortcuts. Relative URLs are a convenient shorthand for specifying a resource within a resource. Many browsers also support "automatic expansion" of URLs, where the user can type in a key (memorable) part of a URL, and the browser fills in the rest. This is explained in Section 2.3.2.
URLs come in two flavors: absolute and relative . So far, we have looked only at absolute URLs. With an absolute URL, you have all the information you need to access a resource.
On the other hand, relative URLs are incomplete. To get all the information needed to access a resource from a relative URL, you must interpret it relative to another URL, called its base .
Relative URLs are a convenient shorthand notation for URLs. If you have ever written HTML by hand, you have probably found them to be a great shortcut. Example 2-1 contains an example HTML document with an embedded relative URL.
Example 2-1. HTML snippet with relative URLs
<HTML>
<HEAD><TITLE>Joe's Tools</TITLE></HEAD>
<BODY>
<H1> Tools Page </H1>
<H2> Hammers <H2>
<P> Joe's Hardware Online has the largest selection of <A HREF="./hammers.html">hammers
</BODY>
</HTML>
In Example 2-1, we have an HTML document for the resource:
http://www.joes-hardware.com/tools.html
In the HTML document, there is a hyperlink containing the URL ./hammers.html. This URL seems incomplete, but it is a legal relative URL. It can be interpreted relative to the URL of the document in which it is found; in this case, relative to the resource /tools.html on the Joe's Hardware web server.
The abbreviated relative URL syntax lets HTML authors omit from URLs the scheme, host, and other components. These components can be inferred by the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Shady Characters
URLs were designed to be portable. They were also designed to uniformly name all the resources on the Internet, which means that they will be transmitted through various protocols. Because all of these protocols have different mechanisms for transmitting their data, it was important for URLs to be designed so that they could be transmitted safely through any Internet protocol.
Safe transmission means that URLs can be transmitted without the risk of losing information. Some protocols, such as the Simple Mail Transfer Protocol (SMTP) for electronic mail, use transmission methods that can strip off certain characters. To get around this, URLs are permitted to contain only characters from a relatively small, universally safe alphabet.
In addition to wanting URLs to be transportable by all Internet protocols, designers wanted them to be readable by people. So invisible, nonprinting characters also are prohibited in URLs, even though these characters may pass through mailers and otherwise be portable.
To complicate matters further, URLs also need to be complete. URL designers realized there would be times when people would want URLs to contain binary data or characters outside of the universally safe alphabet. So, an escape mechanism was added, allowing unsafe characters to be encoded into safe characters for transport.
This section summarizes the universal alphabet and encoding rules for URLs.
Default computer system character sets often have an Anglocentric bias. Historically, many computer applications have used the US-ASCII character set. US-ASCII uses 7 bits to represent most keys available on an English typewriter and a few nonprinting control characters for text formatting and hardware signalling.
US-ASCII is very portable, due to its long legacy. But while it's convenient to citizens of the U.S., it doesn't support the inflected characters common in European languages or the hundreds of non-Romanic languages read by billions of people around the world.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
A Sea of Schemes
In this section, we'll take a look at the more common scheme formats on the Web. Appendix A gives a fairly exhaustive list of schemes and references to their individual documentation.
Table 2-4 summarizes some of the most popular schemes. Reviewing Section 2.2 will make the syntax portion of the table a little more familiar.
Table 2-4: Common scheme formats
Scheme
Description
http
The Hypertext Transfer Protocol scheme conforms to the general URL format, except that there is no username or password. The port defaults to 80 if omitted.
Basic form:
http://<host>:<port>/<path>?<query>#<frag>
Examples:
http://www.joes-hardware.com/index.html
http://www.joes-hardware.com:80/index.html
https
The https scheme is a twin to the http scheme. The only difference is that the https scheme uses Netscape's Secure Sockets Layer (SSL), which provides end-to-end encryption of HTTP connections. Its syntax is identical to that of HTTP, with a default port of 443.
Basic form:
https://<host>:<port>/<path>?<query>#<frag>
Example:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Future
URLs are a powerful tool. Their design allows them to name all existing objects and easily encompass new formats. They provide a uniform naming mechanism that can be shared between Internet protocols.
However, they are not perfect. URLs are really addresses, not true names. This means that a URL tells you where something is located, for the moment. It provides you with the name of a specific server on a specific port, where you can find the resource. The downfall of this scheme is that if the resource is moved, the URL is no longer valid. And at that point, it provides no way to locate the object.
What would be ideal is if you had the real name of an object, which you could use to look up that object regardless of its location. As with a person, given the name of the resource and a few other facts, you could track down that resource, regardless of where it moved.
The Internet Engineering Task Force (IETF) has been working on a new standard, uniform resource names (URNs), for some time now, to address just this issue. URNs provide a stable name for an object, regardless of where that object moves (either inside a web server or across web servers).
Persistent uniform resource locators (PURLs) are an example of how URN functionality can be achieved using URLs. The concept is to introduce another level of indirection in looking up a resource, using an intermediary resource locator server that catalogues and tracks the actual URL of a resource. A client can request a persistent URL from the locator, which can then respond with a resource that redirects the client to the actual and current URL for the resource (see Figure 2-6). For more information on PURLs, visit http://purl.oclc.org.
Figure 2-6: PURLs use a resource locator server to name the current location of a resource
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
For More Information
For more information on URLs, refer to:
http://www.w3.org/Addressing/
The W3C web page about naming and addressing URIs and URLs.
http://www.ietf.org/rfc/rfc1738
RFC 1738, "Uniform Resource Locators (URL)," by T. Berners-Lee, L. Masinter, and M. McCahill.
http://www.ietf.org/rfc/rfc2396.txt
RFC 2396, "Uniform Resource Identifiers (URI): Generic Syntax," by T. Berners-Lee, R. Fielding, and L. Masinter.
http://www.ietf.org/rfc/rfc2141.txt
RFC 2141, "URN Syntax," by R. Moats.
http://purl.oclc.org
The persistent uniform resource locator web site.
http://www.ietf.org/rfc/rfc1808.txt
RFC 1808, "Relative Uniform Resource Locators," by R. Fielding.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: HTTP Messages
If HTTP is the Internet's courier, HTTP messages are the packages it uses to move things around. In Chapter 1, we showed how HTTP programs send each other messages to get work done. This chapter tells you all about HTTP messages—how to create them and how to understand them. After reading this chapter, you'll know most of what you need to know to write your own HTTP applications. In particular, you'll understand:
  • How messages flow
  • The three parts of HTTP messages (start line, headers, and entity body)
  • The differences between request and response messages
  • The various functions (methods) that request messages support
  • The various status codes that are returned with response messages
  • What the various HTTP headers do
HTTP messages are the blocks of data sent between HTTP applications. These blocks of data begin with some text meta-information describing the message contents and meaning, followed by optional data. These messages flow between clients, servers, and proxies. The terms "inbound," "outbound," "upstream," and "downstream" describe message direction.
HTTP uses the terms inbound and outbound to describe transactional direction. Messages travel inbound to the origin server, and when their work is done, they travel outbound back to the user agent (see Figure 3-1).
Figure 3-1: Messages travel inbound to the origin server and outbound back to the client
HTTP messages flow like rivers. All messages flow
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Flow of Messages
HTTP messages are the blocks of data sent between HTTP applications. These blocks of data begin with some text meta-information describing the message contents and meaning, followed by optional data. These messages flow between clients, servers, and proxies. The terms "inbound," "outbound," "upstream," and "downstream" describe message direction.
HTTP uses the terms inbound and outbound to describe transactional direction. Messages travel inbound to the origin server, and when their work is done, they travel outbound back to the user agent (see Figure 3-1).
Figure 3-1: Messages travel inbound to the origin server and outbound back to the client
HTTP messages flow like rivers. All messages flow downstream , regardless of whether they are request messages or response messages (see Figure 3-2). The sender of any message is upstream of the receiver. In Figure 3-2, proxy 1 is upstream of proxy 3 for the request but downstream of proxy 3 for the response.
Figure 3-2: All messages flow downstream
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The Parts of a Message
HTTP messages are simple, formatted blocks of data. Take a peek at Figure 3-3 for an example. Each message contains either a request from a client or a response from a server. They consist of three parts: a start line describing the message, a block of headers containing attributes, and an optional body containing data.
Figure 3-3: Three parts of an HTTP message
The start line and headers are just ASCII text, broken up by lines. Each line ends with a two-character end-of-line sequence, consisting of a carriage return (ASCII 13) and a line-feed character (ASCII 10). This end-of-line sequence is written " CRLF." It is worth pointing out that while the HTTP specification for terminating lines is CRLF, robust applications also should accept just a line-feed character. Some older or broken HTTP applications do not always send both the carriage return and line feed.
The entity body or message body (or just plain "body") is simply an optional chunk of data. Unlike the start line and headers, the body can contain text or binary data or can be empty.
In the example in Figure 3-3, the headers give you a bit of information about the body. The Content-Type line tells you what the body is—in this example, it is a plain-text document. The Content-Length line tells you how big the body is; here it is a meager 19 bytes.
All HTTP messages fall into two types: request messages and response messages. Request messages request an action from a web server. Response messages carry results of a request back to a client. Both request and response messages have the same basic message structure. Figure 3-4 shows request and response messages to get a GIF image.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Methods
Let's talk in more detail about some of the basic HTTP methods, listed earlier in Table 3-1. Note that not all methods are implemented by every server. To be compliant with HTTP Version 1.1, a server need implement only the GET and HEAD methods for its resources.
Even when servers do implement all of these methods, the methods most likely have restricted uses. For example, servers that support DELETE or PUT (described later in this section) would not want just anyone to be able to delete or store resources. These restrictions generally are set up in the server's configuration, so they vary from site to site and from server to server.
HTTP defines a set of methods that are called safe methods. The GET and HEAD methods are said to be safe, meaning that no action should occur as a result of an HTTP request that uses either the GET or HEAD method.
By no action, we mean that nothing will happen on the server as a result of the HTTP request. For example, consider when you are shopping online at Joe's Hardware and you click on the "submit purchase" button. Clicking on the button submits a POST request (discussed later) with your credit card information, and an action is performed on the server on your behalf. In this case, the action is your credit card being charged for your purchase.
There is no guarantee that a safe method won't cause an action to be performed (in practice, that is up to the web developers). Safe methods are meant to allow HTTP application developers to let users know when an unsafe method that may cause some action to be performed is being used. In our Joe's Hardware example, your web browser may pop up a warning message letting you know that you are making a request with an unsafe method and that, as a result, something might happen on the server (e.g., your credit card being charged).
GET is the most common method. It usually is used to ask a server to send a resource. HTTP/1.1 requires servers to implement this method. Figure 3-7 shows an example of a client making an HTTP request with the GET method.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Status Codes
HTTP status codes are classified into five broad categories, as shown earlier in Table 3-2. This section summarizes the HTTP status codes for each of the five classes.
The status codes provide an easy way for clients to understand the results of their transactions. In this section, we also list example reason phrases, though there is no real guidance on the exact text for reason phrases. We include the recommended reason phrases from the HTTP/1.1 specification.
HTTP/1.1 introduced the informational status codes to the protocol. They are relatively new and subject to a bit of controversy about their complexity and perceived value. Table 3-6 lists the defined informational status codes.
Table 3-6: Informational status codes and reason phrases
Status code
Reason phrase
Meaning