book

Web Caching

Name: Web Caching
Author: Duane Wessels
ISBN: 9781565925366

by Duane Wessels

June 2001

Intermediate to advanced

320 pages

9h 18m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Web Caching
Preface
Audience
What You Will and Won’t Find Here
Caching Resources
Web SitesMailing Lists
Conventions Used in This Book
How To Contact Us
Acknowledgments
1. Introduction
Web ArchitectureClients and ServersProxiesWeb ObjectsResource Identifiers
Web Transport Protocols
HTTPFTPSSL/TLSGopher
Why Cache the Web?
LatencyBandwidthServer Load

Why Not Cache the Web?
Types of Web Caches
Browser CachesCaching ProxiesSurrogates
Caching Proxy Features
Meshes, Clusters, and Hierarchies
Products
2. How Web Caching Works
HTTP RequestsOrigin Server RequestsProxy RequestsNon-HTTP Proxy Requests
Is It Cachable?
Status CodesRequest MethodsExpiration and ValidationCache-controlAuthenticationCookiesDynamic Content
Hits, Misses, and Freshness
Hit Ratios
Validation
Last-modified TimestampsEntity TagsWeak and Strong Validators
Forcing a Cache to Refresh
The no-cache DirectiveThe max-age DirectiveThe min-fresh Directive
Cache Replacement
Least Recently Used (LRU)First In, First Out (FIFO)Least Frequently Used (LFU)SizeGreedyDual-Size (GDS)Other Algorithms
3. Politics of Web Caching
PrivacyAccess LogsMaking Requests Anonymous
Request Blocking
Copyright
Does Caching Infringe?Cases and PrecedentsThe DMCAHTTP’s Role
Offensive Content
Dynamic Web Pages
Java Applets
Content Integrity
Cache Busting and Server Busting
Advertising
Trust
Effects of Proxies
4. Configuring Cache Clients
Proxy Addresses
Manual Proxy Configuration
Configuring Microsoft Internet ExplorerConfiguring Netscape NavigatorNCSA Mosaic, Lynx, and Wget
Proxy Auto-Configuration Script
Writing a Proxy Auto-Configuration FunctionSample PAC ScriptsSetting the Proxy Auto-Configuration Script
Web Proxy Auto-Discovery
Other Configuration Options
The Bottom Line
5. Interception Proxying and Caching
Overview
The IP Layer: Routing
Inline CachesLayer Four SwitchesWCCPCisco Policy Routing
The TCP Layer: Ports and Delivery
LinuxipchainsiptablesFreeBSDOther Operating Systems
The Application Layer: HTTP
Debugging Interception
Issues
It’s Difficult for Users to BypassPacket Transport ServiceRouting ChangesIt Affects More Than Browsers and UsersNo-Intercept ListsAre Port 80 Packets Always HTTP?HTTP Interoperation ProblemsIP Interoperation Problems
To Intercept or Not To Intercept
6. Configuring Servers to Work with Caches
Important HTTP HeadersDateLast-modifiedExpiresCache-controlContent-length
Being Cache-Friendly
Why?LatencyHiding network failuresServer load reductionTen Ways to be Cache-FriendlyApacheThe Expires headerGeneral header manipulationSetting headers from CGI scriptsHow to Choose Expiration Times
Being Cache-Unfriendly
Other Issues for Content Providers
What About Dynamic Responses?What About Advertisements?Getting Accurate Access Counts
7. Cache Hierarchies
How Hierarchies Work
Why Join a Hierarchy?
PerformanceNondefault Routing
Why Not Join a Hierarchy?
TrustLow Hit RatiosEffects on RoutingFreshnessLarge FamiliesAbuses, Real and ImaginedError MessagesFalse HitsForwarding LoopsFailures and Service Denial
Optimizing Hierarchies
8. Intercache Protocols
ICPHistoryFeaturesHit predictionProbing the networkObject data with hitsSource RTT measurementsIssuesDelaysBandwidthFalse hitsUDPNo request methodQueries for uncachable responsesInteroperationUnwanted queriesMulticast ICP
CARP
HTCP
Issues
Cache Digests
Bloom FiltersComparing Digests and ICP
Which Protocol to Use
9. Cache Clusters
The Hot Spare
Throughput and Load Sharing
Bandwidth
10. Design Considerations for Caching Services
Appliance or Software SolutionAppliancesSoftware
Disk Space
Memory
Network Interfaces
Operating Systems
High Availability
Intercepting Traffic
Load Sharing
Location
Using a Hierarchy
11. Monitoring the Health of Your Caches
What to Monitor?
Monitoring Tools
UCD-SNMPRRDToolOther Tools
12. Benchmarking Proxy Caches
MetricsThroughputResponse TimeHit RatioConnection CapacityCost
Performance Bottlenecks
Disk ThroughputCPU PowerNIC BandwidthMemoryNetwork State
Benchmarking Tools
Web PolygraphBlastWisconsin Proxy BenchmarkWebJammaOther Benchmarks
Benchmarking Gotchas
TCP Delayed ACKsPort Number ExhaustionNIC Duplex ModeBad Ethernet CablesFull CachesTest DurationLong-Lived ConnectionsSmall Working SetsClock SyncMSL (TIME_WAIT) Values
How to Benchmark a Proxy Cache
Configure SystemsTest the NetworkNo-Proxy TestFill the CacheRun the Benchmark
Sample Benchmark Results
ThroughputResponse TimeHit RatioOther Results
A. Analysis of Production Cache Trace Data
Reply and Object Sizes
Content Types
HTTP Headers
Client Request HeadersClient Reply Headers
Protocols
Port Numbers
Popularity
Size and Popularity
Cachability
Service Times
Hit Ratios
Object Life Cycle
Request Methods
Reply Status Code
B. Internet Cache Protocol
ICPv2 Message FormatOpcodeVersionMessage LengthReqnumOptionsOption DataSender Host AddressPayload
Opcodes
Option Flags
Experimental Features
PointersObject AdvertisementRequest NotificationObject Removal and InvalidationMD5 Object KeysEliminating URLs from RepliesWiretappingPrefetching
C. Cache Array Routing Protocol
Membership Table
Routing Function
Examples
D. Hypertext Caching Protocol
Message Format and Magic ConstantsHEADERDATAAUTH
HTCP Data Types
COUNTSTRSPECIFIERDETAILIDENTITY
HTCP Opcodes
NOPTSTTST requestTST responseMONMON requestMON responseSETSET requestSET responseCLRCLR requestCLR response
E. Cache Digests
The Cache Digest ImplementationKeysHash FunctionsSizing the FilterSelecting Objects for the DigestFalse Hits and Digest FreshnessExchanging Digests
Message Format
An Example
F. HTTP Status Codes
1xx Intermediate Status
2xx Successful Response
3xx Redirects
4xx Request Errors
5xx Server Errors
G. U.S.C. 17 Sec. 512. Limitations on Liability Relating to Material Online
List of Acronyms
H. Bibliography
Books and Articles
Request For Comments
Index
Colophon

Content preview from Web Caching

Appendix A. Analysis of Production Cache Trace Data

In this appendix, we’ll look at some interesting characteristics of web traffic, such as reply size distributions, HTTP headers, and expiration times. Such data is useful for a number of reasons. First, the information in this appendix backs up some of the statements I made earlier in the book. For example, when I said that small files are more more popular than large ones, I wasn’t just making that up. Second, this data can help you make decisions regarding your own caching proxies. The hit ratio analysis demonstrates how increasing your cache size may result in higher hit ratios.

For these analyses, I use data from two different sources. One is the NLANR/IRCache project, consisting of nine caches I maintain throughout the U.S.^[33] The other is a proxy cache located at a U.S. university, which I’ll call Anon-U. All data comes from production Squid caches with real users.

I use the IRCache data for most analyses because it is significantly larger and includes more information. The IRCache set includes client access logs, cache “store” logs, and HTTP header logs. The access logs are from March 5–25, 2000 and contain 216 million responses. The store logs are from March 8–25, 2000, and contain 71 million entries. The header logs are from April 2–29, 2000, and contain 268 million request and response entries.

The IRCache proxies are unique in certain ways that can skew the data. In other words, the data collected from these proxies ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 156592536XCatalog Page Errata

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Web Caching

by Duane Wessels

Appendix A. Analysis of Production Cache Trace Data

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.