book

Web Caching

by Duane Wessels

June 2001

Intermediate to advanced

320 pages

9h 18m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Audience
Web SitesMailing Lists
Web ArchitectureClients and ServersProxiesWeb ObjectsResource Identifiers
HTTPFTPSSL/TLSGopher
LatencyBandwidthServer Load

Browser CachesCaching ProxiesSurrogates
HTTP RequestsOrigin Server RequestsProxy RequestsNon-HTTP Proxy Requests
Status CodesRequest MethodsExpiration and ValidationCache-controlAuthenticationCookiesDynamic Content
Last-modified TimestampsEntity TagsWeak and Strong Validators
The no-cache DirectiveThe max-age DirectiveThe min-fresh Directive
Least Recently Used (LRU)First In, First Out (FIFO)Least Frequently Used (LFU)SizeGreedyDual-Size (GDS)Other Algorithms
PrivacyAccess LogsMaking Requests Anonymous
Does Caching Infringe?Cases and PrecedentsThe DMCAHTTP’s Role
Java Applets
Proxy Addresses
Configuring Microsoft Internet ExplorerConfiguring Netscape NavigatorNCSA Mosaic, Lynx, and Wget
Writing a Proxy Auto-Configuration FunctionSample PAC ScriptsSetting the Proxy Auto-Configuration Script
Overview
Inline CachesLayer Four SwitchesWCCPCisco Policy Routing
LinuxipchainsiptablesFreeBSDOther Operating Systems
It’s Difficult for Users to BypassPacket Transport ServiceRouting ChangesIt Affects More Than Browsers and UsersNo-Intercept ListsAre Port 80 Packets Always HTTP?HTTP Interoperation ProblemsIP Interoperation Problems
Important HTTP HeadersDateLast-modifiedExpiresCache-controlContent-length
Why?LatencyHiding network failuresServer load reductionTen Ways to be Cache-FriendlyApacheThe Expires headerGeneral header manipulationSetting headers from CGI scriptsHow to Choose Expiration Times
What About Dynamic Responses?What About Advertisements?Getting Accurate Access Counts
How Hierarchies Work
PerformanceNondefault Routing
TrustLow Hit RatiosEffects on RoutingFreshnessLarge FamiliesAbuses, Real and ImaginedError MessagesFalse HitsForwarding LoopsFailures and Service Denial
ICPHistoryFeaturesHit predictionProbing the networkObject data with hitsSource RTT measurementsIssuesDelaysBandwidthFalse hitsUDPNo request methodQueries for uncachable responsesInteroperationUnwanted queriesMulticast ICP
Issues
Bloom FiltersComparing Digests and ICP
The Hot Spare
Appliance or Software SolutionAppliancesSoftware
What to Monitor?
UCD-SNMPRRDToolOther Tools
MetricsThroughputResponse TimeHit RatioConnection CapacityCost
Disk ThroughputCPU PowerNIC BandwidthMemoryNetwork State
Web PolygraphBlastWisconsin Proxy BenchmarkWebJammaOther Benchmarks
TCP Delayed ACKsPort Number ExhaustionNIC Duplex ModeBad Ethernet CablesFull CachesTest DurationLong-Lived ConnectionsSmall Working SetsClock SyncMSL (TIME_WAIT) Values
Configure SystemsTest the NetworkNo-Proxy TestFill the CacheRun the Benchmark
ThroughputResponse TimeHit RatioOther Results
Reply and Object Sizes
Client Request HeadersClient Reply Headers
Size and Popularity
ICPv2 Message FormatOpcodeVersionMessage LengthReqnumOptionsOption DataSender Host AddressPayload
PointersObject AdvertisementRequest NotificationObject Removal and InvalidationMD5 Object KeysEliminating URLs from RepliesWiretappingPrefetching
Membership Table
Message Format and Magic ConstantsHEADERDATAAUTH
COUNTSTRSPECIFIERDETAILIDENTITY
NOPTSTTST requestTST responseMONMON requestMON responseSETSET requestSET responseCLRCLR requestCLR response
The Cache Digest ImplementationKeysHash FunctionsSizing the FilterSelecting Objects for the DigestFalse Hits and Digest FreshnessExchanging Digests
1xx Intermediate Status
Books and Articles

Content preview from Web Caching

Web Transport Protocols

Clients and servers use a number of different transport protocols to exchange information. These protocols, built on top of TCP/IP, comprise the majority of all Internet traffic today. The Hypertext Transfer Protocol (HTTP) is the most common because it was designed specifically for the Web. A number of legacy protocols, such as the File Transfer Protocol (FTP) and Gopher, are still in use today. According to Merit’s measurements from the NSFNet, HTTP replaced FTP as the dominant protocol in April of 1995.^[2] Some newer protocols, such as Secure Sockets Layer (SSL) and the Real-time Transport Protocol (RTP), are increasing in use.

HTTP

Tim Berners-Lee and others originally designed HTTP to be a simple and lightweight transfer protocol. Since its inception, HTTP has undergone three major revisions. The very first version, retroactively named HTTP/0.9, is extremely simple and almost trivial to implement. At the same time, however, it lacks any real features. The second version, HTTP/1.0 [Berners-Lee, Fielding and Frystyk, 1996], defines a small set of features and still maintains the original goals of being simple and lightweight. However, at a time when the Web was experiencing phenomenal growth, many developers found that HTTP/1.0 did not provide all the functionality they required for new services.

The HTTP Working Group of the Internet Engineering Task Force (IETF) has worked long and hard on the protocol specification for HTTP/1.1. New features in this ...