Learning how to talk the web measurement talk is the first step in really taking advantage of the data, especially if your hope is to someday become a professional “web data analyst.”
In web measurement, terminology is tremendously important. Because so few people have experience measuring activity on the Internet, it is important to explain the most important terms and how they’re used. If you’re technically inclined, this hack is designed to help you understand how the bits and bytes are translated into information about human activities. If you’re more marketing oriented, this hack will help you understand where the information comes from.
Figure 1-1 illustrates the relationship between the basic terms. As you can see, as the volume of available data decreases, the value of that information increases. At the bottom of the pyramid and in greatest volume, we have “hits,” and at the top, we have “unique visitors,” the holy grail of “things that can be measured.”
The term hit is perhaps the most overused and misunderstood word in the entire web measurement vocabulary. People talk about “site hits,” “page hits,” and “hits from search engines” ad nauseum. The best definition of a hit is provided by WebTrends:
When you read the definition of a page view, you’ll be struck by the similarity of the two definitions, but consider the words “or downloads a file.” Files, in this context, include executable files; PDFs; sound files; JPEG, PNG, and GIF images; etc. The problem is that the “page” that appears in your web browser is technically the aggregate of potentially hundreds of “hits”—every image and page element is counted as a hit.
So if every time a page loads any number of hits is recorded, but a different number of hits depending on the number of images used to render the page, how can one reasonably expect to use hits in a business context?
You can’t. Don’t try.
The best you can do with a “hit” is to recognize that it’s simply one of those words that people misuse and move on. Use words like “page views” and “referrals from search engines,” and you’ll be talking the talk. In web measurement, “hits” is an anachronism; the term’s time has come and gone.
The page view is the fundamental unit in web measurement, ideally recorded when a person sees a web page. Page views are the measurement of a visitor’s interest in your site and the basis for a visitor’s clickstream, the sequential list of pages a visitor sees during his visit.
In their recent document Interactive Audience Measurement and Advertising Campaign Reporting and Audit Guidelines, the Interactive Advertising Bureau (IAB), a governing body for Internet advertising measurement standards, had the following to say about page views:
Page [views] are defined as measurement of responses from a web server to a page request from the user browser, which is filtered to remove robotic activity and error codes prior to reporting, and is recorded at a point as close as possible to opportunity to see the page by the user.
For the sake of this book, the definition of page view is:
A page view is counted with the successful loading of any document containing content that was requested by a web site visitor, regardless of the mechanism of delivery or the number and frequency with which said content is requested.
While there are a number of problems associated with how page views are defined and used in the web measurement market, it’s tremendously important to understand the general concept. Page views, in practical usage, provide an easy way to convey the popularity of a page or section of your site. While not as people-centric as visits and unique visitors, page view is a term you’ll use frequently when talking the talk.
A visit, also referred to as a session or user session, is generally defined by the collection of pages viewed when someone browses a web site (the “clickstream”). It is defined by the IAB (in particularly droll language) as:
One or more text and/or graphics downloads from a site qualifying as at least one page [view], without 30 consecutive minutes of inactivity, which can be reasonably attributed to a single browser for a single session.
While this concept is not particularly complex, ambiguity arises when you consider how people browse web sites. Consider two examples:
Tammie enters a URL into her browser and methodically clicks links, completing her given task in a reasonable amount of time and then moving onto the next site, hopefully satisfied.
Tom enters a URL into his browser and drifts around, randomly clicking links, taking breaks of varying duration to drink coffee, make lunch, chat on the phone, and coming and going willy-nilly for hours on end.
Both are reasonable and common strategies for using the Internet. Unfortunately, while it is easy to know when Tammie’s visit ends—when she has completed her specific task—the same determination is difficult to make for Tom. Because it is nearly impossible to determine the intent of a web visitor, certain assumptions are required. A fundamental assumption is that any visitor who fails to click for more than 30 minutes has mentally “moved on,” and her visit should be considered ended.
Why 30 minutes, you ask? An excellent question! Unfortunately, one without an answer; suffice to say, 30 minutes for visit expiration is a widely used standard, something worth remembering when you want to talk the talk.
The most useful definition of a visit is as follows:
A visit is counted when a unique visitor creates activity on a web site, measured using sequential page views (clickstream), regardless of the duration of this activity as long as the period of inactivity between page views does not extend beyond 30 minutes.
You’ll see that there is no upper limit on the length of a visit—one visitor can click around for as long as he pleases, as long as he clicks a measured link at least once every 29 minutes and 59 seconds. Visitors can visit a site multiple times a day; the ratio of visits to visitors is a great key performance indicator [Hack #94] . Visits are tied to referring sources like paid and natural search terms [ Hacks #42 and #43 ] and banner ad campaigns [Hack #40] . Visits bridge the gap to truly meaningful information about real people.
In the field of web site measurement, people are called “unique visitors.” Unique visitors are the top of the pyramid model of web measurement data (Figure 1-1) and exist in three forms—totally anonymous, mostly anonymous, and known [Hack #5] . The important things to remember about unique visitors are that they are human beings, not nonhuman user agents [Hack #23] .
In terms of a strict definition of unique visitor, the IAB has this to say:
Unique [visitors] represent the number of actual individual people, within a designated reporting timeframe, with activity consisting of one or more visits to a site or the delivery of pushed content…. Each individual is counted only once in the unique [visitor] measures for the reporting period.
Again, while using the least engaging language possible, the IAB has captured the essence of the unique visitor. Especially important is the concept of timeframe and the relationship between unique visitors and visits. I think the best definition of a unique visitor is as follows:
A unique visitor is counted when a human being uses a web browser to visit a web site, regardless of the number of pages visited or the duration of the visit. A visitor can be unique for different periods of time, and the individuality of a visitor is preferably defined by a truly unique user identifier shared between browsers. A unique visitor for any arbitrary timeframe should be counted one time and one time only on her first visit between the start and end dates.
As long as you remember that unique visitors are people just like you and me, you’ll be fine. If you remember that the uniqueness of visitors is associated with a specific timeframe—the day, the week, the month, or the football season—you’re golden.
Anything online that drives visitors to your web site is said to “refer” traffic to you, hence the term referrer. Referrers are generic web sites, search engines, banner ads, weblogs, email, and affiliates: basically online sources that inspire unique visitors to visit your web site and generate page views. All that is required of a referrer is that it can be identified based on information contained in the HTTP request. The following logfile shows some examples referrers:
22.214.171.124 --[15/May/2000:23:03:36 -0800] "GET /index.htm HTTP/1.0" 200 956 " http://www.webanalyticsdemystified.com" "Mozilla/2.0 (compatible; MSIE4.0; SK; Windows 98)" 126.96.36.199 -- [15/May/2000:23:03:42 -0900] "GET /mail/email_marketing. htm HTTP/1.0" 200 956 "
http://www.altavista.digital.com/cgi-bin/query-bin/query?pg=aq&text=yes&d0=1%2fnov%2f99&q=email+marketing %2a&stq=30" "Mozilla/ 4.05 [en] (Win 95; I)" 188.8.131.52 -- [15/May/2000:23:03:56 -0300] "GET /index.htm HTTP/1.0" 200 956 "http://www.oreilly.com/lists/links.php?link_list_id=134" "Mozilla/4.0 (compatible; MSIE4.01; Windows 98)"
The example shows that:
One visit started when a unique visitor came from http://www.webanalyticsdemystified.com at 23:03:36 requesting the file index.htm.
A second visit started when a unique visitor came from an AltaVista search for “email marketing” at 23:03:42 requesting /mail/email_marketing.htm.
A third visit started when a unique visitor came from a link list at the O’Reilly web site (http://www.oreilly.com/list/links.php?link_list_id=134) at 23:03:56 requesting the file index.htm.
The best working definition of a referrer is as follows:
The second half of this definition was added in recognition that email is a very important component of Internet marketing efforts, but many email applications don’t provide referring URLs. When analyzing a referring URL, you should examine the entire URL—the http://www.oreilly.com/books/hacks/websitemeaurementhacks.html plus any information contained in the query string (the stuff after the ? in a dynamic URL)—so you can reconstruct the exact and entire page that contained the original link. If you cannot, hopefully you’re able to embed information into the requesting URL that describes the medium and message that contained the referring link.
As you can see in Figure 1-2, while this
visitor was referred to the Web Analytics
Demystified web site from an email, we can determine that
he came from the December 2004 campaign (
campaign=Dec2004), he clicked on a “buy now”
creative was an image (
creative=image), and the link identifier was
id=54412). Any good web
[Hack #3] will be able to
leverage this information, usually using campaign and email tracking
functionality [Hack #41] .
At the end of the day, each term is part of the framework for web site measurement. Make sure you really understand the subtleties associated with each one; using “visits” when you mean “unique visitors” can have a profound effect on someone else’s understanding. When you really get this—when you talk the talk, as it were—you’re going to be saying things like:
“We’re looking more closely at a dramatic increase in the average number of page views to our corporate policies over the last week.”
“The ratio of visits to unique visitors from our biggest online partners has dropped off significantly, so we’ve contacted them to see if they’ve somehow modified the message on their end.”
“We’re generating 20 times the page views per visit from our most recent campaign. Since our average advertising CPM is over $30, this is pretty significant from a revenue standpoint.”
“Hits? I’m sorry, we don’t use that term around here unless we’re talking about baseball.”
You get the idea. Talk the talk, and everything else falls into place.