BUY THIS BOOK
Add to Cart

Print Book $44.95


Add to Cart

PDF $31.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £31.95

What is this?

Looking to Reprint or License this content?

Squid: The Definitive Guide
Squid: The Definitive Guide

By Duane Wessels
Book Price: $44.95 USD
£31.95 GBP
PDF Price: $31.99

Cover | Table of Contents | Colophon


Table of Contents

Chapter 1: Introduction
This long-overdue book is about Squid: a popular open source caching proxy for the Web. With Squid you can:
  • Use less bandwidth on your Internet connection when surfing the Web
  • Reduce the amount of time web pages take to load
  • Protect the hosts on your internal network by proxying their web traffic
  • Collect statistics about web traffic on your network
  • Prevent users from visiting inappropriate web sites at work or school
  • Ensure that only authorized users can surf the Internet
  • Enhance your user's privacy by filtering sensitive information from web requests
  • Reduce the load on your own web server(s)
  • Convert encrypted (HTTPS) requests on one side, to unencrypted (HTTP) requests on the other
Squid's job is to be both a proxy and a cache. As a proxy, Squid is an intermediary in a web transaction. It accepts a request from a client, processes that request, and then forwards the request to the origin server. The request may be logged, rejected, and even modified before forwarding. As a cache, Squid stores recently retrieved web content for possible reuse later. Subsequent requests for the same content may be served from the cache, rather than contacting the origin server again. You can disable the caching part of Squid if you like, but the proxying part is essential.
Figure 1-1: Squid sits between clients and servers
As Figure 1-1 shows, Squid accepts HTTP (and HTTPS) requests from clients, and speaks a number of protocols to servers. In particular, Squid knows how to talk to HTTP, FTP, and Gopher servers. Conceptually, Squid has two "sides." The client-side talks to web clients (e.g., browsers and user-agents); the server-side talks to HTTP, FTP, and Gopher servers. These are called origin servers, because they are the origin location for the data they serve.
Note that Squid's client-side understands only HTTP (and HTTP encrypted with SSL/TLS). This means, for example, that you can't make an FTP client talk to Squid (unless the FTP client is also an HTTP client). Furthermore, Squid can't proxy protocols for email (SMTP), instant messaging, or Internet Relay Chat.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Web Caching
Web caching refers to the act of storing certain web resources (i.e., pages and other data files) for possible future reuse. For example, Matilda is the first person in the office each morning, and she likes to read the local newspaper online with her wake-up coffee. As she visits the various sections, the Squid cache on their office network stores the HTML pages and JPEG images. Harry comes in a short while later and also reads the newspaper online. For him, the site loads much faster because much of the content is served from Squid. Additionally, Harry's browsing doesn't waste the bandwidth of the company's DSL line by transferring the exact same data as when Matilda viewed the site.
A cache hit occurs each time Squid satisfies an HTTP request from its cache. The cache hit ratio, or cache hit rate, is the percentage of all requests satisfied as hits. Web caches typically achieve hit ratios between 30% and 60%. A similar metric, the byte hit ratio, represents the volume of data (i.e., number of bytes) served from the cache.
A cache miss occurs when Squid can't satisfy a request from the cache. A miss can happen for any number of reasons. Obviously, the first time Squid receives a request for a particular resource, it is a cache miss. Similarly, Squid may have purged the cached copy to make room for new objects.
Another possibility is that the resource is uncachable. Origin servers can instruct caches on how to treat the response. For example, they can say that the data must never be cached, can be reused only within a certain amount of time, and so on. Squid also uses a few internal heuristics to determine what should, or should not, be saved for future use.
Cache validation is a process that ensures Squid doesn't serve stale data to the user. Before reusing a cached response, Squid often validates it with the origin server. If the server indicates that Squid's copy is still valid, the data is sent from Squid. Otherwise, Squid updates its cached copy as it relays the response to the client. Squid generally performs validation using timestamps. The origin server's response usually contains a
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
A Brief History of Squid
In the beginning was the CERN HTTP server. In addition to functioning as an HTTP server, it was also the first caching proxy. The caching module was written by Ari Luotonen in 1994.
That same year, the Internet Research Task Force Group on Resource Discovery (IRTF-RD) started the Harvest project. It was "an integrated set of tools to gather, extract, organize, search, cache, and replicate" Internet information. I joined the Harvest project near the end of 1994. While most people used Harvest as a local (or distributed) search engine, the Object Cache component was quite popular as well. The Harvest cache boasted three major improvements over the CERN cache: faster use of the filesystem, a single process design, and caching hierarchies via the Internet Cache Protocol.
Towards the end of 1995, many Harvest team members made the move to the exciting world of Internet-based startup companies. The original authors of the Harvest cache code, Peter Danzig and Anawat Chankhunthod, turned it into a commercial product. Their company was later acquired by Network Appliance. In early 1996, I joined the National Laboratory for Applied Network Research (NLANR) to work on the Information Resource Caching (IRCache) project, funded by the National Science Foundation. Under this project, we took the Harvest cache code, renamed it Squid, and released it under the GNU General Public License.
Since that time Squid has grown in size and features. It now supports a number of cool things such as URL redirection, traffic shaping, sophisticated access controls, numerous authentication modules, advanced disk storage options, HTTP interception, and surrogate mode (a.k.a. HTTP server acceleration).
Funding for the IRCache project ended in July 2000. Today, a number of volunteers continue to develop and support Squid. We occasionally receive financial or other types of support from companies that benefit from Squid.
Looking towards the future, we are rewriting Squid in C++ and, at the same time, fixing a number of design issues in the older code that are limiting to new features. We are adding support for protocols such as Edge Side Includes (ESI) and Internet Content Adaptation Protocol (ICAP). We also plan to make Squid support IPv6. A few developers are constantly making Squid run better on Microsoft Windows platforms. Finally, we will add more and more HTTP/1.1 features and work towards full compliance with the latest protocol specification.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Hardware and Operating System Requirements
Squid runs on all popular Unix systems, as well as Microsoft Windows. Although Squid's Windows support is improving all the time, you may have an easier time with Unix. If you have a favorite operating system, I'd suggest using that one. Otherwise, if you're looking for a recommendation, I really like FreeBSD.
Squid's hardware requirements are generally modest. Memory is often the most important resource. A memory shortage causes a drastic degradation in performance. Disk space is, naturally, another important factor. More disk space means more cached objects and higher hit ratios. Fast disks and interfaces are also beneficial. SCSI performs better than ATA, if you can justify the higher costs. While fast CPUs are nice, they aren't critical to good performance.
Because Squid uses a small amount of memory for every cached response, there is a relationship between disk space and memory requirements. As a rule of thumb, you need 32 MB of memory for each GB of disk space. Thus, a system with 512 MB of RAM can support a 16-GB disk cache. Your mileage may vary, of course. Memory requirements depend on factors such as the mean object size, CPU architecture (32- or 64-bit), the number of concurrent users, and particular features that you use.
People often ask such questions as, "I have a network with X users. What kind of hardware do I need for Squid?" These questions are difficult to answer for a number of reasons. In particular, it's hard to say how much traffic X users will generate. I usually find it easier to look at bandwidth usage, and go from there. I tell people to build a system with enough disk space to hold 3-7 days worth of web traffic. For example, if your users consume 1 Mbps (HTTP and FTP traffic only) for 8 hours per day, that's about 3.5 GB per day. So, I'd say you want between 10 and 25 GB of disk space for each Mbps of web traffic.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Squid Is Open Source
Squid is free software and a collaborative project. If you find Squid useful, please consider contributing back to the project in one or more of the following ways:
  • Participate on the squid-users discussion list. Answer questions and help out new users.
  • Try out new versions and report bugs or other problems.
  • Contribute to the online documentation and Frequently Asked Questions (FAQ). If you notice an inconsistency, report it to the maintainers.
  • Submit your local modifications back to the developers for inclusion into the code base.
  • Provide financial support to one or more developers through small development contracts.
  • Tell the developers about features you would like to have.
  • Tell your friends and colleagues that Squid is cool.
Squid is released as free software under the GNU General Public License. This means, for example, that anyone who distributes Squid must make the source code available to you. See http://www.gnu.org/licenses/gpl-faq.html for more information about the GPL.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Squid's Home on the Web
The main source for up-to-date information about Squid is http://www.squid-cache.org. There you can:
  • Download the source code.
  • Read the FAQ and other documentation.
  • Subscribe to the mailing list, or read the archives.
  • Contact the developers.
  • Find links to third-party applications.
  • And more!
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Getting Help
Given that Squid is free software, you may need to rely on the kindness of strangers for occasional assistance. The best place to do this is the squid-users mailing list. Before posting a message to the mailing list, however, you should check Squid's FAQ document to see if your question has already been asked and answered. If neither resource provides the help you need, you can contact one of the many services offering professional support for Squid.
Squid's FAQ document, located at http://www.squid-cache.org/Doc/FAQ/FAQ.html, is a good source of information for new users. The FAQ evolves over time, so it will contain entries written after this book. The FAQ also contains some historical information that may be irrelevant today.
Even so, the FAQ is one of the first places you should look for answers to your questions. This is especially true if you are a new user. While it is certainly less effort for you to simply write to the mailing list for help, veteran mailing list members grow tired of reading and answering the same questions. If your question is frequently asked, it may simply be ignored.
The FAQ is quite large. The HTML version exists as approximately 25 different chapters, each in a separate file. These can be difficult to search for keywords and awkward to print. You can also download PostScript, PDF, and text versions by following links at the top of the HTML version.
Squid has three mailing lists you might find useful. I explain how to become a subscriber below, but you may want to check Squid's mailing list page, http://www.squid-cache.org/mailing-lists.html, for possibly more up-to-date information.

Section 1.6.2.1: squid-users

The squid-users mailing list is an excellent place to find answers for such questions as:
  • How do I ... ?
  • Is this a bug ... ?
  • Does this feature/program work on my platform?
  • What does this error message mean?
Note that you must subscribe before you can post a message. To subscribe to the squid-users list, send a message to squid-users-subscribe@squid-cache.org.
If you prefer, you can receive the digest version of the list. In this case, you'll receive multiple postings in a single email message. To sign up this way, send a message to
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Getting Started with Squid
If you are new to Squid, the next few chapters will help you get started. First, I'll show you how to get the code, either the original source or precompiled binaries. In Chapter 3, I go through the steps necessary to compile and install Squid on your Unix system; this chapter is important because you'll probably need to tune your system before compiling the source code. Chapter 4 provides a very brief introduction to Squid's configuration file. Finally, Chapter 5 explains how to run Squid.
If you've already had a little experience installing and running Squid, you may want to skip ahead to Chapter 6.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Exercises
  • Visit the Squid site and locate the squid-users mailing list archive. Browse the messages for the past few weeks.
  • Search the Squid FAQ for information about file descriptors.
  • Check one of the Squid mirror sites. Is it up to date with the primary site?
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Getting Squid
Squid is normally distributed as source code. This means you'll probably need to compile it, as described in Chapter 3. The installation process should be relatively painless. The developers put a lot of effort into making sure Squid compiles easily on all the popular operating systems.
You can also find precompiled binaries for some operating systems. Linux users can get Squid in one of the various package formats (e.g., RPM, Debian, etc.). The FreeBSD, NetBSD, and OpenBSD projects offer Squid ports. The BSD ports aren't binary distributions but rather a small set of files that know how to download, compile, and install the Squid source. While these precompiled or preconfigured packages may be easier to install, I recommend that you download and compile the source yourself.
Anonymous CVS is a great way for developers and users to stay current with the official source tree. Instead of downloading entire new releases, you run a command to retrieve only the parts that have changed since your last update.
The Squid developers make periodic releases of the source code. Each release has a version number, such as 2.5.STABLE4. The third component starts either with STABLE or DEVEL (short for development).
As you can probably guess, the DEVEL releases tend to have newer, experimental features. They are also more likely to have bugs. Inexperienced users should not run DEVEL releases. If you choose to try a DEVEL release, and you encounter problems, please report them to the Squid maintainers.
After spending some time in the development state, the version number changes to STABLE. These releases are suitable for all users. Of course, even the stable releases may have some bugs. The higher-numbered stable versions (e.g., STABLE3, STABLE4) are likely to have fewer bugs. If you are really concerned about stability, you may want to wait for one of these later releases.
So why can't you just copy a precompiled binary to your system and expect it to work perfectly? The primary reason is that the code needs to know about certain operating system parameters. In particular, the most important parameter is the maximum number of open file descriptors. Squid's
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Versions and Releases
The Squid developers make periodic releases of the source code. Each release has a version number, such as 2.5.STABLE4. The third component starts either with STABLE or DEVEL (short for development).
As you can probably guess, the DEVEL releases tend to have newer, experimental features. They are also more likely to have bugs. Inexperienced users should not run DEVEL releases. If you choose to try a DEVEL release, and you encounter problems, please report them to the Squid maintainers.
After spending some time in the development state, the version number changes to STABLE. These releases are suitable for all users. Of course, even the stable releases may have some bugs. The higher-numbered stable versions (e.g., STABLE3, STABLE4) are likely to have fewer bugs. If you are really concerned about stability, you may want to wait for one of these later releases.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Use the Source, Luke
So why can't you just copy a precompiled binary to your system and expect it to work perfectly? The primary reason is that the code needs to know about certain operating system parameters. In particular, the most important parameter is the maximum number of open file descriptors. Squid's ./configure script (see Section 3.4) probes for these values before compiling. If you take a Squid binary built for one value and run it on a system with a different value, you may encounter problems.
Another reason is that many of Squid's features must be enabled at compile time. If you take a binary that somebody else compiled, and it doesn't include the code for the features that you want, you'll need to compile your own version anyway.
Finally, note that shared libraries sometimes make it difficult to share executable files between systems. Shared libraries are loaded at runtime. This is also known as dynamic linking. Squid's ./configure script probes your system to find out certain things about your C library functions (if they are present, if they work, etc.). Although library functions don't usually change, it is possible that two different systems have slightly different shared C libraries. This may become a problem for Squid if the two systems are different enough.
Getting the Squid source code is really quite easy. To get it, visit the Squid home page, http://www.squid-cache.org/. The home page has links to the current stable and development releases. If you aren't located in the United States, you can select one of the many mirror sites. The mirror sites are usually named "wwwN.CC.squid-cache.org," where N is a number and CC is a two-letter country code. For example, www1.au.squid-cache.org is an Australian mirror site. The home page has links to the current mirror sites.
Each Squid release branch (e.g., Squid-2.5) has its own HTML page. This page has links to the source code releases and "diffs" between releases. If you are upgrading from one release to the next, you may want to download the diff file and apply the patch as described in Section 3.7. The release pages describe the new features and important changes in each version, and also have links to bugs that have been fixed.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Precompiled Binaries
Some Unix distributions include, or make available, precompiled Squid packages. For Linux, you can easily find Squid RPMs. Often the Squid RPM is included on Linux CD-ROMs you can buy. The FreeBSD/NetBSD/OpenBSD distributions also contain Squid in their ports and/or packages collections.
While RPMs and precompiled packages may initially save you some time, they also have some drawbacks. As I already mentioned, certain features must be enabled or disabled before you start compiling Squid. The precompiled package that you install may not have the particular feature you want. Furthermore, Squid's ./configure script probes your operating system for certain parameters. These parameters may be configured differently on your machine on which Squid was compiled. Finally, if you want to apply a patch to Squid, you'll either have to wait for someone to build a new RPM/package or get the source and do it yourself.
I strongly encourage you to compile Squid from the source, but the decision is yours to make.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Anonymous CVS
The Concurrent Versioning System (CVS) is a nifty package that allows you to simultaneously edit and manage source code and other files. Almost every open source software project uses CVS.
You can anonymously access Squid's CVS files (read-only) to keep your source code up to date. The nice thing about CVS is that you can easily retrieve only the changes (diffs) of your current version. Thus, it is easy to see what has changed recently. Applying the changes to your current files efficiently synchronizes your source code with the official version.
CVS uses a tree-like indexing system. The trunk of the tree is called the head branch. For Squid's repository, this is where all new changes and features are placed. The head branch usually contains experimental and, possibly unstable, code. The stable code is typically found on other branches.
To effectively use Squid's anonymous CVS server, you first need to understand how different versions and branches are tagged. For example, the Version 2.5 branch is named SQUID_2_5. Particular releases, which represent a snapshot in time, have longer names, such as SQUID_2_5_STABLE4. To get exactly Squid Version 2.5.STABLE4, use the SQUID_2_5_STABLE4 tag; to get the latest code on the 2.5 branch, use SQUID_2_5.
To use the Squid anonymous CVS server, you first need to set the CVSROOT environment variable:
csh% setenv CVSROOT :pserver:anoncvs@cvs.squid-cache.org:/squid
Or, for Bourne shell users:
sh$ CVSROOT=:pserver:anoncvs@cvs.squid-cache.org:/squid
sh$ export CVSROOT
You then log in to the server:
% cvs login
(Logging in to anoncvs@cvs.squid-cache.org)
CVS password:
At the prompt, enter anoncvs for the password. Now you can check out the source tree with this command:
% cvs checkout -r SQUID_2_5 -d squid-2.5 squid
The -r option specifies the revision tag to retrieve. Omitting the -r option gets you the head branch. The -d option changes the top-level directory name in which files are placed. If you omit the -d option, the top-level directory is the same as the module name. The final command-line argument (squid) is the name of the module to check out.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
devel.squid-cache.org
The Squid developers maintain a separate site, currently hosted at SourceForge, for experimental Squid features. Check it out at http://devel.squid-cache.org/. There you'll find a number of cutting-edge development projects that haven't yet been integrated into the official Squid code base. You can access these projects through SourceForge's anonymous CVS server or download diff files based on the standard releases.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Exercises
  • Visit the Squid web site or FTP server and look at the recent stable and development releases. How often are new releases made?
  • Download the most recent stable code.
  • Use Squid's anonymous CVS server to check out the recent stable branch. Change one of the source files by inserting a blank line, then run cvs diff.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 3: Compiling and Installing
Squid is designed to be portable and should compile on all major Unix systems, including Linux, BSD/OS, FreeBSD, NetBSD, OpenBSD, Solaris, HP-UX, OSF/DUNIX/TRU-64, Mac OS/X, IRIX, and AIX. Squid also runs on Microsoft Windows. Please see Appendix E for instructions on compiling and running Squid on Windows.
Compiling Squid is relatively straightforward. If you've installed more than a few open source packages, you're probably already familiar with the procedure. You first use a program called ./configure to probe your system and then a program called make to do the actual compiling.
Before getting to that step, however, let's talk about tuning your system in preparation for Squid. Your operating system may have default resource limits that are too low for Squid to run correctly. Most importantly, you need to worry about the number of available file descriptors.
If you've been using Unix for a while, chances are that you've already compiled a number of other software packages. If so, you can probably quickly scan this chapter. The procedure for compiling and installing Squid is similar to many other software distributions.
To compile Squid, you need an ANSI C compiler. Don't be too alarmed by the "ANSI" part. Chances are that if you already have a C compiler, it is compliant with the ANSI specification. The GNU C compiler (gcc) is an excellent choice and widely available. Most operating systems come with a C compiler as a part of the standard installation. The common exceptions are Solaris and HP-UX. If you're using one of those operating systems, you might not have a compiler installed.
Ideally you should compile Squid on the same system on which it will run. Part of the installation process probes your system for certain parameters, such as the number of available file descriptors. However, if your system doesn't have a C compiler, you may be able to compile Squid elsewhere and then copy the binaries back. If the operating systems are different, Squid may encounter some problems. Also, Squid may become confused if the two systems have different kernel configurations.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Before You Start
If you've been using Unix for a while, chances are that you've already compiled a number of other software packages. If so, you can probably quickly scan this chapter. The procedure for compiling and installing Squid is similar to many other software distributions.
To compile Squid, you need an ANSI C compiler. Don't be too alarmed by the "ANSI" part. Chances are that if you already have a C compiler, it is compliant with the ANSI specification. The GNU C compiler (gcc) is an excellent choice and widely available. Most operating systems come with a C compiler as a part of the standard installation. The common exceptions are Solaris and HP-UX. If you're using one of those operating systems, you might not have a compiler installed.
Ideally you should compile Squid on the same system on which it will run. Part of the installation process probes your system for certain parameters, such as the number of available file descriptors. However, if your system doesn't have a C compiler, you may be able to compile Squid elsewhere and then copy the binaries back. If the operating systems are different, Squid may encounter some problems. Also, Squid may become confused if the two systems have different kernel configurations.
In addition to a C compiler, you'll also need Perl and awk. awk is a standard program on all Unix systems, so you shouldn't need to worry about it. Perl is quite common, but it may not be installed on your system by default. You may need the gzip program to uncompress the source distribution file.
Solaris users, make sure that /usr/ccs/bin is in your PATH, even if you're using gcc. To compile Squid, you may need the make and ar programs found in that directory.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Unpacking the Source
After downloading the source distribution, you need to unpack it somewhere. The particular location doesn't really matter. You can unpack Squid in your home directory or anywhere; you'll need about 20 MB of free disk space. Personally, I like to use /tmp. Use the tar command to extract the source directory:
% cd /tmp
% tar xzvf /some/where/squid-2.5.STABLE4-src.tar.gz
squid-2.5.STABLE4/
squid-2.5.STABLE4/CONTRIBUTORS
squid-2.5.STABLE4/COPYING
squid-2.5.STABLE4/COPYRIGHT
squid-2.5.STABLE4/CREDITS
squid-2.5.STABLE4/ChangeLog
squid-2.5.STABLE4/INSTALL
squid-2.5.STABLE4/QUICKSTART
squid-2.5.STABLE4/README
...
Some tar programs don't have the z option, which automatically uncompresses gzip files. In that case, you'll need to use this command:
% gzip -dc /some/where/squid-2.5.STABLE4-src.tar.gz | tar xvf -
Once the source code has been unpacked, the next step is usually to configure the source tree. However, if this is the first time you're compiling Squid, you should make sure certain kernel resource limits are high enough; to find out how, read on.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Pretuning Your Kernel
Squid requires a fair amount of kernel resources under moderate and high loads. In particular, you may need to configure your system with a higher-than-normal number of file descriptors and mbuf clusters. The file-descriptor limit can be especially annoying. You'd be better off to increase the limit before compiling Squid.
At this point, you might be tempted to get the precompiled binaries to avoid the hassle of building a new kernel. Unfortunately, you need to make a new kernel, regardless. Squid and the kernel exchange information through data structures that must not exceed the set file-descriptor limits. Squid checks these limits at runtime and uses the safest (smallest) value. Thus, even if a precompiled binary has higher file descriptors than the kernel, the kernel value takes precedence.
To change some settings, you must build and install a new kernel. This procedure varies among different operating systems. Consult Unix System Administration Handbook (Prentice Hall) or your operating-system documentation if necessary. If you're using Linux, you probably don't need to recompile your kernel.
File descriptors are simply integers that identify each file and socket that a process has opened. The first opened file is 0, the second is 1, and so on. Unix operating systems usually impose a limit on the number of file descriptors that each process can open. Furthermore, Unix also normally has a systemwide limit.
Because of the way Squid works, the file-descriptor limits may adversely affect performance. When Squid uses up all the available file descriptors, it is unable to accept new connections from users. In other words, running out of file descriptors causes denial of service. Squid can't accept new requests until some of the current requests complete, and the corresponding files and sockets are closed. Squid issues a warning when it detects a file-descriptor shortage.
You can save yourself some trouble by making sure the file descriptor limits are appropriate before running ./configure. In most cases, 1024 file descriptors will be sufficient. Very busy caches may require 4096 or more. When configuring file descriptor limits, I recommend setting the systemwide limit to twice the per-process limit.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The configure Script
Like many other Unix software packages, Squid uses a ./configure script to learn about an operating system before compiling. The ./configure script is generated by the popular GNU autoconf program. When the script runs, it probes the system in various ways to find out about libraries, functions, types, parameters, and features that may or may not be present. One of the first things that ./configure does is look for a working C compiler. If the compiler can't be found or fails to compile a simple test program, the ./configure script can't proceed.
The ./configure script has a number of different options. The most important is the installation prefix. Before running ./configure, you need to decide where Squid should live. The installation prefix determines the default locations for the Squid logs, binaries, and configuration files. You can change the location for those files after installing, but it's easier if you decide now.
The default installation prefix is /usr/local/squid. Squid puts files in seven different subdirectories under the prefix:
% ls -l /usr/local/squid
total 5
drwxr-x---  2 wessels  wheel  512 Apr 28 20:42 bin
drwxr-x---  2 wessels  wheel  512 Apr 28 20:42 etc
drwxr-x---  2 wessels  wheel  512 Apr 28 20:42 libexec
drwxr-x---  3 wessels  wheel  512 Apr 28 20:43 man
drwxr-x---  2 wessels  wheel  512 Apr 28 20:42 sbin
drwxr-x---  4 wessels  wheel  512 Apr 28 20:42 share
drwxr-x---  4 wessels  wheel  512 Apr 28 20:43 var
Squid uses the bin, etc, libexec, man, sbin, and share directories for a few, relatively small files (or other directories) that don't change very often. The files under the var directory, however, are a different story. This is where you'll find Squid's log files, which may grow quite large (tens or hundreds of megabytes). var is also the default location for the actual disk cache. You may want to put var on a different partition with plenty of space. One easy way to do this is with the —localstatedir option:
% ./configure --localstatedir=/bigdisk/var
You don't need to worry too much about pathnames when configuring Squid. You can always change the pathnames later, in the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
make
Once ./configure has done its job, you can simply type make to begin compiling the source code:
% make
Normally, this part goes smoothly. You'll see a lot of lines that look like this:
source='cbdata.c' object='cbdata.o' libtool=no  depfile='.deps/cbdata.Po'
tmpdepfile='.deps/cbdata.TPo'  depmode=gcc /bin/sh ../cfgaux/depcomp  gcc -DHAVE_
CONFIG_H -DDEFAULT_CONFIG_FILE=\"/usr/local/squid/etc/squid.conf\" -I. -I. -I../
include -I. -I. -I../include -I../include     -g -O2 -Wall -c 'test -f cbdata.c ||
echo './''cbdata.c
source='client_db.c' object='client_db.o' libtool=no  depfile='.deps/client_db.Po'
tmpdepfile='.deps/client_db.TPo'  depmode=gcc /bin/sh ../cfgaux/depcomp  gcc -DHAVE_
CONFIG_H -DDEFAULT_CONFIG_FILE=\"/usr/local/squid/etc/squid.conf\" -I. -I. -I../
include -I. -I. -I../include -I../include     -g -O2 -Wall -c 'test -f client_db.c ||
echo './''client_db.c
source='client_side.c' object='client_side.o' libtool=no  depfile='.deps/client_side.Po'
tmpdepfile='.deps/client_side.TPo'  depmode=gcc /bin/sh ../cfgaux/depcomp  gcc -
DHAVE_CONFIG_H -DDEFAULT_CONFIG_FILE=\"/usr/local/squid/etc/squid.conf\" -I. -I. -I../
include -I. -I. -I../include -I../include     -g -O2 -Wall -c 'test -f client_side.c ||
echo './''client_side.c
source='comm.c' object='comm.o' libtool=no  depfile='.deps/comm.Po' tmpdepfile='.
deps/comm.TPo'  depmode=gcc /bin/sh ../cfgaux/depcomp  gcc -DHAVE_CONFIG_H -DDEFAULT_
CONFIG_FILE=\"/usr/local/squid/etc/squid.conf\" -I. -I. -I../include -I. -I. -I../
include -I../include     -g -O2 -Wall -c 'test -f comm.c || echo './''comm.c
You may see some compiler warnings. In most cases, it is safe to ignore these. If you see a lot of them or something that looks really serious, report it to the developers as described in Section 16.5.
If the compilation gets all the way to the end without any errors, you can move to the next section, which describes how to install the programs you just built.
To verify that compilation was successful, you can run make again. You should see this output:
% make
Making all in lib...
Making all in scripts...
Making all in src...
Making all in fs...
Making all in repl...
'squid' is up to date.
'client' is up to date.
'unlinkd' is up to date.
'cachemgr.cgi' is up to date.
Making all in icons...
Making all in errors...
Making all in auth_modules...
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
make Install
After compiling, you need to install the programs into their permanent directories. This might require superuser privileges, to put files in the installation directories. If so, become root first:
% su
Password:
# make install
If you enable Squid's ICMP measurement features with the —enable-icmp option, you must install the pinger program. The pinger program must be installed with superuser privileges because only root is allowed to send and receive ICMP messages. The following command installs pinger with the appropriate permissions:
# make install-pinger
After installing Squid, you should see the following directories and files listed under the installation prefix directory (/usr/local/squid by default):
sbin
The sbin directory contains programs normally started by root.
sbin/squid
This is the main Squid program.
bin
The bin directory contains programs for all users.
bin/RunCache
RunCache is a shell script you can use to start Squid. If Squid dies, this script automatically starts it again, unless it detects frequent restarts. The RunCache script is a relic from the time when Squid was not a daemon process. With the current versions, RunCache is less useful because Squid automatically restarts itself when you don't use the -N option.
bin/RunAccel
The RunAccel script is nearly identical to RunCache, except that it adds a command-line argument that tells Squid where to listen for HTTP requests.
bin/squidclient
squidclient is a simple HTTP client you can use to test Squid. It also has some special features for making management requests to a running Squid process.
libexec
The libexec directory traditionally contains helper programs. These are commands that you wouldn't normally run yourself. Rather, these programs are normally started by other programs.
libexec/unlinkd
unlinkd is a helper program that removes files from the cache directories. As you'll see later, file deletion can be a significant bottleneck. By implementing the delete operation in an external process, Squid achieves some performance gain.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Applying a Patch
After you've been running Squid for a while, you may find that you need to patch the source code to fix a bug or add an experimental feature. Patches are posted for important bug fixes on the squid-cache.org web site. If you don't want to wait for the next official release, you can download and apply the patch to your source code. You will then need to recompile Squid.
To apply a patch—also sometimes called a diff—you need a program called patch. Chances are that your operating system already has the patch program. If not, you can download it from the GNU collection (http://www.gnu.org/directory/patch.html). Note that if you're using anonymous CVS (see Section 2.4), you don't need to worry about patching files. The CVS system does it for you automatically when you update your tree.
To apply a patch, you need to save the patch file somewhere on your system. Then cd to the Squid source directory and run the command like this:
% cd squid-2.5.STABLE4
% patch < /tmp/patch_file
By default, the patch program tells you what it's doing as it runs. Usually this output scrolls by very quickly, unless there is a problem. You can safely ignore the warnings that say offset NNN lines. If you don't want to see all this output, use the -s option to make patch silent.
When patch updates the source files, it creates a backup copy of the original file. For example, if you're applying a patch to src/http.c, patch names the backup file src/http.c.orig. Thus, if you want to undo the patch after applying it, you can simply rename all the .orig files back to their former names. To use this technique successfully, it's a good idea to remove all .orig files before applying a patch.
If patch encounters a problem, it stops and prompts you for advice. Common problems are as follows:
  • Running patch from the wrong directory. To fix this problem, you may need to cd to a different directory or use patch's -p option.
  • Patch is already applied. patch can usually tell if the patch file has already been applied. In this case, it asks if you want to unpatch the file.
  • The patch program doesn't understand the file you are giving it. Patch files come in three flavors: normal, context, and unified. Old versions of
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Running configure Later
Sometimes you may find it necessary to rerun ./configure. For example, if you tune your kernel parameters, you must run ./configure again so it picks up the new settings. As you read this book, you may also find that you want to use features that must be enabled with ./configure options.
To rerun ./configure with the same options, use this command:
% ./config.status --recheck
Another technique is to "touch" the config.status file, which updates its timestamp. This causes make to re-run the ./configure script before compiling the source code:
% touch config.status
% make
To add or remove ./configure options, you need to type in the whole command again. If you can't remember the previous options, just look at the top of the config.status file. For example:
% head config.status
#! /bin/sh
# Generated automatically by configure.
# Run this file to recreate the current configuration.
# This directory was configured as follows,
# on host foo.life-gone-hazy.com:
#
# ./configure  --enable-storeio=ufs,diskd --enable-carp \
#   --enable-auth-modules=NCSA
# Compiler output produced by configure, useful for debugging
# configure, is in ./config.log if it exists.
After rerunning ./configure, you must compile and install Squid again. To be safe, it's a good idea to run make clean first:
% make clean
% make
Recall that ./configure caches the things it discovers about your system. In some situations, you'll want to clear this cache and start the compilation process from the very beginning. You can simply remove the config.cache file if you like. Then, the next time ./configure runs, it won't use the previous values. You can also restore the Squid source tree to its preconfigure state with the following command:
% make distclean
This removes all object files and other files created by the ./configure and make commands.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Exercises
  • After compiling Squid, remove one or more of the .o files and run make again.
  • Use the ulimit or limits command to change the file descriptor limit to some small value before compiling Squid. Does ./configure obey or ignore your new limit?
  • Compile Squid with a high file-descriptor limit, then try to run it on a system with a lower limit. Does Squid use the lower or higher limit?
  • What happens if you mistype one of the —enable options? What if you specify an invalid storage scheme with the —enable-store-io option?
  • After compiling Squid, remove src/Makefile and try to compile it again. What's the easiest way to restore the file?
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 4: Configuration Guide for the Eager
After compiling and installing Squid, your next task is to delve into the configuration file. If you're new to Squid, you're likely to find it a bit overwhelming. The most recent version has approximately 200 configuration file directives and 2700 lines of comments. I certainly don't expect you to read about, and configure, every directive before starting Squid. This chapter can help you get Squid running quickly.
All the squid.conf directives have default values. You might be able to get Squid going without even touching the configuration file. However, I don't recommend trying that. You'll be much happier if you read the following sections first.
If you are really turned off by Squid's configuration file syntax, you might want to try the Webmin graphical user interface. It allows you to configure Squid (and numerous other programs) from your web browser. See http://www.webmin.com and The Book of Webmin by Joe Cooper (No Starch Press) for more information.
Squid's configuration file is relatively straightforward. It is similar in style to many other Unix programs. Each line begins with a configuration directive, followed by some number of values and/or keywords. Squid ignores empty lines and comment lines (beginning with #) when reading the configuration file. Here are some sample configuration lines:
cache_log /squid/var/cache.log

# define the localhost ACL
acl Localhost src 127.0.0.1/32

connect_timeout 2 minutes

log_fqdn on
Some directives take a single value. For these, repeating the directive with a different value overwrites the previous value. For example, there is only one connect_timeout value. The first line in the following example has no effect because the second line overwrites it:
connect_timeout 2 minutes
connect_timeout 1 hour
On the other hand, some directives are actually lists of values. For these, each occurrence of the directive adds a new value to the list. The extension_methods directive works this way:
extension_methods UNGET
extension_methods UNPUT
extension_methods UNPOST
For these list-based directives, you can also usually put multiple values on the same line:
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
The squid.conf Syntax
Squid's configuration file is relatively straightforward. It is similar in style to many other Unix programs. Each line begins with a configuration directive, followed by some number of values and/or keywords. Squid ignores empty lines and comment lines (beginning with #) when reading the configuration file. Here are some sample configuration lines:
cache_log /squid/var/cache.log

# define the localhost ACL
acl Localhost src 127.0.0.1/32

connect_timeout 2 minutes

log_fqdn on
Some directives take a single value. For these, repeating the directive with a different value overwrites the previous value. For example, there is only one connect_timeout value. The first line in the following example has no effect because the second line overwrites it:
connect_timeout 2 minutes
connect_timeout 1 hour
On the other hand, some directives are actually lists of values. For these, each occurrence of the directive adds a new value to the list. The extension_methods directive works this way:
extension_methods UNGET
extension_methods UNPUT
extension_methods UNPOST
For these list-based directives, you can also usually put multiple values on the same line:
extension_methods UNGET UNPUT UNPOST
Many of the directives have common types. For example, connect_timeout is a time specification that has a number followed by a unit of time. For example:
connect_timeout 3 hours
client_lifetime 4 days
negative_ttl 27 minutes
Similarly, a number of directives refer to the size of a file or chunk of memory. For these, you can write a size specification as a decimal number, followed by bytes, KB, MB, or GB. For example:
minimum_object_size 12 bytes
request_header_max_size 10 KB
maximum_object_size 187 MB
Another type worth mentioning is the toggle, which can be either on or off. Many directives use this type. For example:
server_persistent_connections on
strip_query_terms off
prefer_direct on
In general, the configuration file directives may appear in any order. However, the order is important when one directive makes reference to something defined by another. Access controls are a good example. An
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
User IDs
Content preview·