book

Web Caching

Name: Web Caching
Author: Duane Wessels
ISBN: 9781565925366

by Duane Wessels

June 2001

Intermediate to advanced

320 pages

9h 18m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Web Caching
Preface
Audience
What You Will and Won’t Find Here
Caching Resources
Web SitesMailing Lists
Conventions Used in This Book
How To Contact Us
Acknowledgments
1. Introduction
Web ArchitectureClients and ServersProxiesWeb ObjectsResource Identifiers
Web Transport Protocols
HTTPFTPSSL/TLSGopher
Why Cache the Web?
LatencyBandwidthServer Load

Why Not Cache the Web?
Types of Web Caches
Browser CachesCaching ProxiesSurrogates
Caching Proxy Features
Meshes, Clusters, and Hierarchies
Products
2. How Web Caching Works
HTTP RequestsOrigin Server RequestsProxy RequestsNon-HTTP Proxy Requests
Is It Cachable?
Status CodesRequest MethodsExpiration and ValidationCache-controlAuthenticationCookiesDynamic Content
Hits, Misses, and Freshness
Hit Ratios
Validation
Last-modified TimestampsEntity TagsWeak and Strong Validators
Forcing a Cache to Refresh
The no-cache DirectiveThe max-age DirectiveThe min-fresh Directive
Cache Replacement
Least Recently Used (LRU)First In, First Out (FIFO)Least Frequently Used (LFU)SizeGreedyDual-Size (GDS)Other Algorithms
3. Politics of Web Caching
PrivacyAccess LogsMaking Requests Anonymous
Request Blocking
Copyright
Does Caching Infringe?Cases and PrecedentsThe DMCAHTTP’s Role
Offensive Content
Dynamic Web Pages
Java Applets
Content Integrity
Cache Busting and Server Busting
Advertising
Trust
Effects of Proxies
4. Configuring Cache Clients
Proxy Addresses
Manual Proxy Configuration
Configuring Microsoft Internet ExplorerConfiguring Netscape NavigatorNCSA Mosaic, Lynx, and Wget
Proxy Auto-Configuration Script
Writing a Proxy Auto-Configuration FunctionSample PAC ScriptsSetting the Proxy Auto-Configuration Script
Web Proxy Auto-Discovery
Other Configuration Options
The Bottom Line
5. Interception Proxying and Caching
Overview
The IP Layer: Routing
Inline CachesLayer Four SwitchesWCCPCisco Policy Routing
The TCP Layer: Ports and Delivery
LinuxipchainsiptablesFreeBSDOther Operating Systems
The Application Layer: HTTP
Debugging Interception
Issues
It’s Difficult for Users to BypassPacket Transport ServiceRouting ChangesIt Affects More Than Browsers and UsersNo-Intercept ListsAre Port 80 Packets Always HTTP?HTTP Interoperation ProblemsIP Interoperation Problems
To Intercept or Not To Intercept
6. Configuring Servers to Work with Caches
Important HTTP HeadersDateLast-modifiedExpiresCache-controlContent-length
Being Cache-Friendly
Why?LatencyHiding network failuresServer load reductionTen Ways to be Cache-FriendlyApacheThe Expires headerGeneral header manipulationSetting headers from CGI scriptsHow to Choose Expiration Times
Being Cache-Unfriendly
Other Issues for Content Providers
What About Dynamic Responses?What About Advertisements?Getting Accurate Access Counts
7. Cache Hierarchies
How Hierarchies Work
Why Join a Hierarchy?
PerformanceNondefault Routing
Why Not Join a Hierarchy?
TrustLow Hit RatiosEffects on RoutingFreshnessLarge FamiliesAbuses, Real and ImaginedError MessagesFalse HitsForwarding LoopsFailures and Service Denial
Optimizing Hierarchies
8. Intercache Protocols
ICPHistoryFeaturesHit predictionProbing the networkObject data with hitsSource RTT measurementsIssuesDelaysBandwidthFalse hitsUDPNo request methodQueries for uncachable responsesInteroperationUnwanted queriesMulticast ICP
CARP
HTCP
Issues
Cache Digests
Bloom FiltersComparing Digests and ICP
Which Protocol to Use
9. Cache Clusters
The Hot Spare
Throughput and Load Sharing
Bandwidth
10. Design Considerations for Caching Services
Appliance or Software SolutionAppliancesSoftware
Disk Space
Memory
Network Interfaces
Operating Systems
High Availability
Intercepting Traffic
Load Sharing
Location
Using a Hierarchy
11. Monitoring the Health of Your Caches
What to Monitor?
Monitoring Tools
UCD-SNMPRRDToolOther Tools
12. Benchmarking Proxy Caches
MetricsThroughputResponse TimeHit RatioConnection CapacityCost
Performance Bottlenecks
Disk ThroughputCPU PowerNIC BandwidthMemoryNetwork State
Benchmarking Tools
Web PolygraphBlastWisconsin Proxy BenchmarkWebJammaOther Benchmarks
Benchmarking Gotchas
TCP Delayed ACKsPort Number ExhaustionNIC Duplex ModeBad Ethernet CablesFull CachesTest DurationLong-Lived ConnectionsSmall Working SetsClock SyncMSL (TIME_WAIT) Values
How to Benchmark a Proxy Cache
Configure SystemsTest the NetworkNo-Proxy TestFill the CacheRun the Benchmark
Sample Benchmark Results
ThroughputResponse TimeHit RatioOther Results
A. Analysis of Production Cache Trace Data
Reply and Object Sizes
Content Types
HTTP Headers
Client Request HeadersClient Reply Headers
Protocols
Port Numbers
Popularity
Size and Popularity
Cachability
Service Times
Hit Ratios
Object Life Cycle
Request Methods
Reply Status Code
B. Internet Cache Protocol
ICPv2 Message FormatOpcodeVersionMessage LengthReqnumOptionsOption DataSender Host AddressPayload
Opcodes
Option Flags
Experimental Features
PointersObject AdvertisementRequest NotificationObject Removal and InvalidationMD5 Object KeysEliminating URLs from RepliesWiretappingPrefetching
C. Cache Array Routing Protocol
Membership Table
Routing Function
Examples
D. Hypertext Caching Protocol
Message Format and Magic ConstantsHEADERDATAAUTH
HTCP Data Types
COUNTSTRSPECIFIERDETAILIDENTITY
HTCP Opcodes
NOPTSTTST requestTST responseMONMON requestMON responseSETSET requestSET responseCLRCLR requestCLR response
E. Cache Digests
The Cache Digest ImplementationKeysHash FunctionsSizing the FilterSelecting Objects for the DigestFalse Hits and Digest FreshnessExchanging Digests
Message Format
An Example
F. HTTP Status Codes
1xx Intermediate Status
2xx Successful Response
3xx Redirects
4xx Request Errors
5xx Server Errors
G. U.S.C. 17 Sec. 512. Limitations on Liability Relating to Material Online
List of Acronyms
H. Bibliography
Books and Articles
Request For Comments
Index
Colophon

Content preview from Web Caching

Types of Web Caches

Web content can be cached at a number of different locations along the path between a client and an origin server. First, many browsers and other user agents have built-in caches. For simplicity, I’ll call these browser caches. Next, a caching proxy (a.k.a. “proxy cache”) aggregates all of the requests from a group of clients. Lastly, a surrogate can be located in front of an origin server to cache popular responses. In this book, we’ll spend more time talking about caching proxies than the others.

Browser Caches

Browsers and other user agents benefit from having a built-in cache. When you press the Back button on your browser, it reads the previous page from its cache. Nongraphical agents, such as web crawlers, cache objects as temporary files on disk rather than keeping them in memory.

Netscape Navigator lets you control exactly how much memory and disk space to use for caching, and it also allows you to flush the cache. Microsoft Internet Explorer lets you control the size of your local disk cache, but in a less flexible way. Both have controls for how often cached responses should be validated. People generally use 10–100MB of disk space for their browser cache.

A browser cache is limited to just one user, or at least one user agent. Thus, it gets hits only when the user revisits a page. As we’ll see later, browser caches can store “private” responses, but shared caches cannot.

Caching Proxies

Caching proxies, unlike browser caches, service many different ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 156592536XCatalog Page Errata

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design