Why Not Cache the Web?
By now, you may have the impression that web caching is a wonderful solution without any negative side effects. In fact, there are a number of important issues and consequences to understand about web caching. I’ll mention some of them here, with a deeper discussion to follow in Chapter 3.
Unlike more tightly coupled systems, it can be difficult for a web cache to guarantee consistency. This means that a cache might return out-of-date information to a user. Why should this be the case? One important factor is that web servers provide only weak hints about freshness. Many responses don’t have any hints at all. On-demand validation is the only way to guarantee a cached response is up-to-date. Given the relatively high latencies involved (compared to other systems), validation can take a significant amount of time. Furthermore, the cache may not even be able to reach the server due to a network or server failure. If a validation request fails, the cache doesn’t really know if its response is up-to-date or not. Some caching products can be configured to intentionally return stale responses.
If you’ve ever set up and maintained a web server, you understand how good it feels to watch the access log file and see people visiting your site. Many content providers feel the same way. They want to know exactly who their users are, which pages they view, and how often. Caches complicate their analysis. Requests served as cache hits are not logged at the origin server. ...