How HTTP Clients Work
Once the server is set up, we can get down to business. The client has the easy end: it wants web action on a particular site, and it sends a request with a URL that begins with http to indicate what service it wants (other common services are ftp for File Transfer Protocolor https for HTTP with Secure Sockets Layer — SSL) and continues with these possible parts:
//<user>:<password>@<host>:<port>/<url-path>
RFC 1738 says:
Some or all of the parts “<user>:<password>@”, “:<password>”,":<port>”, and “/<url-path>” may be omitted. The scheme specific data start with a double slash “//” to indicate that it complies with the common Internet scheme syntax.
In real life, URLs look more like: http://www.apache.org/ — that is, there is no user and password pair, and there is no port. What happens?
The browser observes that the URL starts with http: and deduces that it should be using the HTTP protocol. The client then contacts a name server, which uses DNS to resolve www.apache.org to an IP address. At the time of writing, this was 63.251.56.142. One way to check the validity of a hostname is to go to the operating-system prompt[8] and type:
ping www.apache.org
If that host is connected to the Internet, a response is returned:
Pinging www.apache.org [63.251.56.142] with 32 bytes of data: Reply from 63.251.56.142: bytes=32 time=278ms TTL=49 Reply from 63.251.56.142: bytes=32 time=620ms TTL=49 Reply from 63.251.56.142: bytes=32 time=285ms TTL=49 Reply from 63.251.56.142: bytes=32 time=290ms TTL=49 Ping statistics for 63.251.56.142:
A URL can be given more precision by attaching a port number: the web address http://www.apache.org doesn’t include a port because it is port 80, the default, and the browser takes it for granted. If some other port is wanted, it is included in the URL after a colon — for example, http://www.apache.org:8000/. We will have more to do with ports later.
The URL always includes a path, even if is only /. If the path is left out by the careless user, most browsers put it back in. If the path were /some/where/foo.html on port 8000, the URL would be http://www.apache.org:8000/some/where/foo.html.
The client now makes a TCP connection to port number 8000 on IP 204.152.144.38 and sends the following message down the connection (if it is using HTTP 1.0):
GET /some/where/foo.html HTTP/1.0<CR><LF><CR><LF>
These carriage returns
and line feeds (CRLF) are very important because they separate the
HTTP header from its body. If the request were a
POST
, there would be data following. The server
sends the response back and closes the connection. To see it in
action, connect again to the Internet, get a command-line prompt, and
type the following:
% telnet www.apache.org 80
> telnet www.apache.org 80
GET http://www.apache.org/foundation/contact.html HTTP/1.1
Host: www.apache.org
On Win98, telnet puts up a dialog box. Click connect → remote system, and change Port from “telnet” to “80”. In Terminal → preferences, check “local echo”. Then type this, followed by two Returns:
GET http://www.apache.org/foundation/contact.html HTTP/1.1 Host: www.apache.org
You should see text similar to that which follows.
Some implementations of telnet rather unnervingly don’t echo what you type to the screen, so it seems that nothing is happening. Nevertheless, a whole mess of response streams past:
Trying 64.125.133.20... Connected to www.apache.org. Escape character is '^]'. HTTP/1.1 200 OK Date: Mon, 25 Feb 2002 15:03:19 GMT Server: Apache/2.0.32 (Unix) Cache-Control: max-age=86400 Expires: Tue, 26 Feb 2002 15:03:19 GMT Accept-Ranges: bytes Content-Length: 4946 Content-Type: text/html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> <title>Contact Information--The Apache Software Foundation</title> </head> <body bgcolor="#ffffff" text="#000000" link="#525D76"> <table border="0" width="100%" cellspacing="0"> <tr><!-- SITE BANNER AND PROJECT IMAGE --> <td align="left" valign="top"> <a href="http://www.apache.org/"><img src="../images/asf_logo_wide.gif" alt="The Apache Software Foundation" align="left" border="0"/></a> </td> </tr> </table> <table border="0" width="100%" cellspacing="4"> <tr><td colspan="2"><hr noshade="noshade" size="1"/></td></tr> <tr> <!-- LEFT SIDE NAVIGATION --> <td valign="top" nowrap="nowrap"> <p><b><a href="/foundation/projects.html">Apache Projects</a></b></p> <menu compact="compact"> <li><a href="http://httpd.apache.org/">HTTP Server</a></li> <li><a href="http://apr.apache.org/">APR</a></li> <li><a href="http://jakarta.apache.org/">Jakarta</a></li> <li><a href="http://perl.apache.org/">Perl</a></li> <li><a href="http://php.apache.org/">PHP</a></li> <li><a href="http://tcl.apache.org/">TCL</a></li> <li><a href="http://xml.apache.org/">XML</a></li> <li><a href="/foundation/conferences.html">Conferences</a></li> <li><a href="/foundation/">Foundation</a></li> </menu> ...... and so on
[8] The operating-system prompt is likely to be “>” (Win95) or “%” (Unix). When we say, for instance, “Type % ping,” we mean, “When you see '%', type ‘ping’.”
Get Apache: The Definitive Guide, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.