Measurement solutions based on web server logfiles suffer from a variety of factors that decrease their accuracy. Caching devices are the primary culprits but, in some cases, the cache can be beaten and accuracy improved.
Web server logfiles suffer from a handful of accuracy issues, perhaps the most significant arising from caching devices on the Internet. A caching device is any piece of hardware or software designed to store temporary copies of a file, most often to improve delivery performance. There are two types of caching devices that create problems for web server logfiles: clientside caches and server-side caches.
Client-side caches are deployed locally in corporate network operation centers and at Internet Service Providers to improve performance. The most extreme example of a client-side cache is the browser cache, software built into your Internet browser that is designed to save local copies of files. Server-side caches are often placed in front of your own web servers to reduce load. (See Web Caching [O’Reilly] for a complete treatise on the subject, or, if you prefer going online, Wikipedia has an excellent entry on the subject at http://en.wikipedia.org/wiki/Web_cache.)
The essentials of caching are as follows: because the document is served from a cache, the request never actually makes it into the web server log. Depending on how many of your pages are cached, the result can be a dramatic undercounting of page views, which then cascades ...