This book is a compilation of some fairly diverse reference material. What links these topics is that they are crucial knowledge for today’s webmaster in a Unix environment.
In this chapter, we give the world’s quickest introduction to web technology and the role of the webmaster who breathes life into each web document. If you want to learn more about the history of the Web, how to make your web pages “cool,” the social impact of the Internet, or how to make money online, this is the wrong book.
This is a book by impatient writers for impatient readers. We’re less interested in the hype of the Web than we are in what makes it actually tick. We’ll leave it to the pundits to predict the future of the Web or to declare today’s technology already outdated. Too much analysis makes our heads spin; we just want to get our web sites online.
We’ve organized this book in a roughly “outside-in” fashion—that is, with the outermost layer (HTML) first and the innermost layer (the server itself) last. But since it’s a good idea for all readers to know how everything fits together, let’s take a minute to breeze through a description of the Web from the inside-out: no history, no analysis, just the technology basics.
The tool most people use on the Web is a browser, such as Netscape Navigator, Internet Explorer, Opera, Mosaic, or Lynx. Web browsers work by connecting over the Internet to remote machines, requesting specific documents, and then formatting the documents they receive for viewing on the local machine.
The language, or protocol, used for web transactions is Hypertext Transfer Protocol, or HTTP. The remote machines containing the documents run HTTP servers that wait for requests from browsers and then return the specified document. The browsers themselves are technically HTTP clients.
One of the most important things to grasp when working on the Web is the format for URLs. A URL is basically an address on the Web, identifying each document uniquely (for example, http://www.oreilly.com/products.html). Since URLs are so fundamental to the Web, we discuss them here in a little detail. The simple syntax for a URL is:
Most URLs you encounter follow this simple syntax. A more generalized syntax, however, is:
- extra-path-info and query-info
Optional information used by CGI programs. See Chapter 12 for more information.
HTML documents also often use a “shorthand” for linking to other documents on the same server, called a relative URL. An example of a relative URL is images/webnut.gif. The browser knows to translate this into complete URL syntax before sending the request. For example, if http://www.oreilly.com/books/webnut.html contains a reference to images/webnut.gif, the browser reconstructs the relative URL as a full (or absolute) URL, http://www.oreilly.com/books/images/webnut.gif, and requests that document independently (if needed).
Often in this book, you’ll see us refer to a URI, not a URL. A URI (Universal Resource Identifier) is a superset of URL, in anticipation of different resource naming conventions being developed for the Web. For the time being, however, the only URI syntax in practice is URL; so while purists might complain, you can safely assume that “URI” is synonymous with “URL” and not go wrong (yet).
While web documents can conceivably be in any format, the universal standard is Hypertext Markup Language (HTML), a language for creating formatted text interspersed with images, sounds, animation, and hypertext links to other documents anywhere on the Web. Chapter 2 through Chapter 8 cover the most current version of HTML.
In 1996, a significant extension to HTML was developed in the form of Cascading Style Sheets (CSS). Cascading Style Sheets allow web site developers to associate a number of style-related characteristics (such as font, color, spacing, etc.) with a particular HTML tag. This enables HTML authors to create a consistent look and feel throughout a set of documents. Chapter 9 provides an overview of and a reference to CSS.
While HTML remains the widespread choice for web site development, there is also an heir apparent called XML (Extensible Markup Language). XML is a meta-language that allows you to define your own document tags. While XML’s development remains highly volatile, Chapter 10 gives you the basics.
In between clients and servers is the network, which uses TCP (Transmission Control Protocol) and IP (Internet Protocol) to transmit data and find servers and clients. On top of TCP/IP, clients and servers use the HTTP protocol to communicate. Chapter 17 gives details on the HTTP protocol, which you must understand for writing CGI programs, server scripts, web administration, and just about any other part of working with a server.
The runaway leader among Unix-based web servers is Apache. Chapter 18 deals with configuring Apache, while Chapter 19 discusses the various Apache modules. Regardless of the type of server you’re running, there are various measures you can take to maximize its efficiency. Chapter 20 describes a number of these server optimization techniques.