Chapter 12. Programming for the Web

When we think about the World Wide Web, we normally think of applications—web browsers, web servers—and the many kinds of content that those applications move around the network. But it’s important to note that standards and protocols, not the applications themselves, have enabled the Web’s growth. Ever since the first days of the Internet, there have been ways to move files from here to there, and document formats that were just as good as HTML, but there was not a unifying model for how to identify, retrieve, and display information; nor was there a universal way for applications to interact with that data over the network. As we all know, HTML came to provide a common data basis for documents. In this chapter, we’re going to talk about how to use HTTP, the protocol that governs communications between web clients and servers, and URLs, which provide a standard for naming and addressing objects on the Web.

In this chapter, we’re also going to talk about web programming: making the Web intelligent, making it do what you want. This involves writing code for both clients and servers. Java provides a powerful API for dealing with URLs, which will be the first focus of our discussion. Then we’ll discuss how to write web clients that can interact with the standard CGI interface, using the GET and POST methods. Finally, we’ll take a look at servlets, simple Java programs that run on web servers and provide an effective way to build intelligence into your web pages. Servlets have been one of the most important and popular developments in Java over the past couple of years.

Uniform Resource Locators (URLs)

A URL points to an object on the Internet. It’s (usually) a text string that identifies an item, tells you where to find it, and specifies a method for communicating with it or retrieving it from its source. A URL can refer to any kind of information source. It might point to static data, such as a file on a local filesystem, a web server, or an FTP archive; or it can point to a more dynamic object such as a news article on a news spool or a record in a database. URLs can even refer to less tangible resources such as telnet sessions and mailing addresses.

The Java URL classes provide an API for accessing well-defined networked resources, like documents and applications on servers. The classes use an extensible set of prefabricated protocol and content handlers to perform the necessary communication and data conversion for accessing URL resources. With URLs, an application can fetch a complete file or database record from a server on the network with just a few lines of code. Applications like web browsers, which deal with networked content, use the URL class to simplify the task of network programming. They also take advantage of the dynamic nature of Java, which allows handlers for new types of URLs to be added on the fly. As new types of servers and new formats for content evolve, additional URL handlers can be supplied to retrieve and interpret the data without modifying the original application.

A URL is usually presented as a string of text, like an address.[38] Since there are many different ways to locate an item on the Net, and different mediums and transports require different kinds of information, there are different formats for different kinds of URLs. The most common form has three components: a network host or server, the name of the item and its location on that host, and a protocol by which the host should communicate:

            protocol://hostname/location/item-name

protocol (also called the “scheme”) is an identifier such as http, ftp, or gopher; hostname is an Internet hostname; and the location and item components form a path that identifies the object on that host. Variants of this form allow extra information to be packed into the URL, specifying things like port numbers for the communications protocol and fragment identifiers that reference parts inside the object.

We sometimes speak of a URL that is relative to another URL, called a base URL. In that case we are using the base URL as a starting point and supplying additional information. For example, the base URL might point to a directory on a web server; a relative URL might name a particular file in that directory.



[38] The term URL was coined by the Uniform Resource Identifier (URI) working group of the IETF to distinguish URLs from the more general notion of Uniform Resource Names or URNs. URLs are really just static addresses, whereas URNs would be more persistent and abstract identifiers used to resolve the location of an object anywhere on the Net. URLs are defined in RFC 1738 and RFC 1808.

Get Learning Java now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.