BUY THIS BOOK
Add to Cart

Print Book $24.95


Add to Cart

Print+PDF $32.44

Add to Cart

PDF $19.99

Safari Books Online

What is this?

Add to UK Cart

Print Book £17.50

What is this?

Looking to Reprint or License this content?


Web Site Measurement Hacks
Web Site Measurement Hacks Tips & Tools to Help Optimize Your Online Business

By Eric T. Peterson
Book Price: $24.95 USD
£17.50 GBP
PDF Price: $19.99

Cover | Table of Contents


Table of Contents

Chapter 1: Web Measurement Basics
Many people consider the "basics" of web measurement anything but. Loaded with confusing and ambiguous terminology, dependent on any number of potentially fallacious assumptions, and often considered the domain of data-loving geeks, no wonder business people have historically eschewed web data analysis for softer and fuzzier endeavors like paid usability studies and online surveys.
But no longer!
Web measurement applications and the vendors that provide them have made great strides in the last few years, making their applications easier to understand and easier to use. The major players are starting to agree on a common vocabulary and working through some of the historical problems with data collection. More and more business people have responded, taking interest in web measurement and actually assigning resources to analyze the resulting data.
Funny how a major economic downturn and the enforcement of fiscal responsibility will motivate people to make decisions based on available data, not just their gut instinct.
Most companies measure their web activity because they have an interest in knowing how well their marketing and advertising budget is being spent. Consider the plight of the average vice president of Internet marketing for a company of any appreciable size. He is likely responsible for the web site, email messaging, banner advertising, paid keyword marketing, organic search, internal search, content, and the online extension of the brand. Given this list and the associated costs of developing and maintaining each piece of marketing collateral, how could he possibly hope to make good decisions without data?
Whether you're in charge of site design and development, usability, marketing, customer communication, customer support, lead generation, online sales, brand messaging, product marketing—trust me, this list goes on and on—you need web measurement data to help inform your job.
Think about it. Do you want your airline pilot flying based on available atmospheric and flight pattern data or gut feel? Do you want your doctor to recommend a treatment after just glancing at you or would you like her to run a few tests? Do you want your automobile mechanic to recommend service for your car after just giving it a listen?
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Hacks 1–13: Introduction
Many people consider the "basics" of web measurement anything but. Loaded with confusing and ambiguous terminology, dependent on any number of potentially fallacious assumptions, and often considered the domain of data-loving geeks, no wonder business people have historically eschewed web data analysis for softer and fuzzier endeavors like paid usability studies and online surveys.
But no longer!
Web measurement applications and the vendors that provide them have made great strides in the last few years, making their applications easier to understand and easier to use. The major players are starting to agree on a common vocabulary and working through some of the historical problems with data collection. More and more business people have responded, taking interest in web measurement and actually assigning resources to analyze the resulting data.
Funny how a major economic downturn and the enforcement of fiscal responsibility will motivate people to make decisions based on available data, not just their gut instinct.
Most companies measure their web activity because they have an interest in knowing how well their marketing and advertising budget is being spent. Consider the plight of the average vice president of Internet marketing for a company of any appreciable size. He is likely responsible for the web site, email messaging, banner advertising, paid keyword marketing, organic search, internal search, content, and the online extension of the brand. Given this list and the associated costs of developing and maintaining each piece of marketing collateral, how could he possibly hope to make good decisions without data?
Whether you're in charge of site design and development, usability, marketing, customer communication, customer support, lead generation, online sales, brand messaging, product marketing—trust me, this list goes on and on—you need web measurement data to help inform your job.
Think about it. Do you want your airline pilot flying based on available atmospheric and flight pattern data or gut feel? Do you want your doctor to recommend a treatment after just glancing at you or would you like her to run a few tests? Do you want your automobile mechanic to recommend service for your car after just giving it a listen?
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Talk the Talk
Learning how to talk the web measurement talk is the first step in really taking advantage of the data, especially if your hope is to someday become a professional "web data analyst."
In web measurement, terminology is tremendously important. Because so few people have experience measuring activity on the Internet, it is important to explain the most important terms and how they're used. If you're technically inclined, this hack is designed to help you understand how the bits and bytes are translated into information about human activities. If you're more marketing oriented, this hack will help you understand where the information comes from.
Figure 1-1 illustrates the relationship between the basic terms. As you can see, as the volume of available data decreases, the value of that information increases. At the bottom of the pyramid and in greatest volume, we have "hits," and at the top, we have "unique visitors," the holy grail of "things that can be measured."
Figure 1-1: The pyramid model of web measurement data
Even if you already "talk the talk," recognize that many of these terms are loosely defined, and the strict definitions that follow serve as the foundation for the rest of this book.
The term hit is perhaps the most overused and misunderstood word in the entire web measurement vocabulary. People talk about "site hits," "page hits," and "hits from search engines" ad nauseum. The best definition of a hit is provided by WebTrends:
(A hit is) an action on a web site such as when a user views a page or downloads a file.
When you read the definition of a page view, you'll be struck by the similarity of the two definitions, but consider the words "or downloads a file." Files, in this context, include executable files; PDFs; sound files; JPEG, PNG, and GIF images; etc. The problem is that the "page" that appears in your web browser is technically the aggregate of potentially hundreds of "hits"—every image and page element is counted as a hit.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Best Practices for Web Measurement
To truly be successful with your online business, you need to treat web measurement as a business practice and be willing to invest time, effort, and money as necessary.
Web measurement is not a silver bullet; in fact, outside the realm of law enforcement and werewolf hunting, there are no silver bullets. In order to be successful with your web measurement program, you have to treat it like any other business process through such things as customer relationship management (CRM), sales force automation (SFA), and enterprise resource planning (ERP). There needs to be an abbreviation for web measurement—for example, "WMO" for "web measurement and optimization," which captures the fact that you measure for improvement's sake, or "SMI" for "site metrics integration," which expresses the need to integrate your metrics with other site operation strategies. Perhaps that's all that stands in the way of web measurement becoming widely used inside organizations: an appropriate abbreviation.
The following best practices, if rigorously followed, will help you identify changes you can make that will dramatically improve your site.
A common mistake that many companies make is to rush out to purchase software or services before they develop sound reasons for doing so, a mistake not exclusive to web measurement. While occasionally these companies are able to back into the rationale for the purchase, a better approach is to actually sit down with those in charge and explore what you hope to gain by an investment in web measurement in advance. This is usually the best place to begin implementing a web measurement strategy: clearly identifying your site's business objectives [Hack #38] . Some examples of clear reasons for investment include:
  • "We're a retailer and our margins are very low. We want to increase the number and value of online purchases while making sure that our marketing dollars are not wasted."
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Select the Right Vendor
One of the most important decisions you will make in web measurement is which vendor you're going to work with, keeping in mind that some are better than others but there is no one "best" vendor for every company's needs.
The web measurement arena is littered with software and service vendors, some good and some bad, yet all eager to take your money. The vendor selection process is often the most painful step in setting up a web measurement program. Understanding the major differences between types of vendors and seeing a brief synopsis of the top vendors in the market can make this process a little less painful.
Vendors can be categorized along two major axes: delivery type and the data collection mechanism. The delivery type characterizes how you use the vendor's services, and falls into two broad categories: software, which you install on your own servers, and hosted services, which are maintained by the vendor. The data collection mechanism describes how the vendor collects data, such as web server logfiles [Hack #22] or client-side JavaScript page tags [Hack #28] . Since a handful of vendors are now supporting both data collection mechanisms, and since often delivery type defines which data model you'll use, we'll focus on delivery type.
The software model for web measurement applications is essentially the "original" model—one very well understood and widely deployed. Companies generally choose software because they seek flexibility from the application and prefer to own the process from beginning to end. Software may be more expensive in terms of up-front fees and first-year investment, but cost savings are usually appreciated in the second and subsequent years when maintenance fees are 17–22 percent of first-year costs (this will make sense when you read about hosted service model pricing). If you go the software route, you need to be ready to support the application internally, maintaining the software when necessary as well as the hardware it runs on. Software typically uses web server logfiles (Figure 1-4) as a data source.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Staff for Web Measurement Success
Selecting the right vendor is only half the battle in successfully building a web measurement program for your company. You should also be thinking about dedicating resources to manage, maintain, and evangelize the data throughout your company.
Many companies make the mistake of simply selecting an application and assuming that all of their problems are solved. The notion of the "silver bullet" software or service is surely attractive, but unfortunately, no such bullet exists. Companies who assume that a technical or marketing resource will be able to spend a little time each week poking around in the measurement application when they have spare cycles nearly always fail to take full advantage of their investment in analytics. In other words, the old adage that "you get what you pay for" holds true in staffing, as it does in vendor selection.
Some companies will inevitably not be able to afford to hire a dedicated staffer to manage their measurement program, especially those companies who are spending less than $25,000 on the entire application and implementation. In these cases, it is recommended that companies at least dedicate one-half of a single person's time to managing the measurement program. Empower a motivated employee to make sure the implementation is good and that people at least understand that the information is available.
Most companies need to dedicate at least one full-time resource to their web measurement efforts. One person who has enough technical skills to manage and tweak the application's implementation but enough business savvy to translate the data into something that the entire organization can use [Hack #91] . Larger organizations should plan to hire more than one resource, especially if the initial investment in a measurement package is particularly large or if a significant number of people in the company will likely use the data on a regular basis. The logic is that no one person can support an army—you need to distribute the responsibility and allow a team of data analysts to support the company.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Get to Know Your Visitors
Knowing that you have three totally different kinds of unique visitors coming to your web site can save you time, money, and headaches.
Visitors are the fundamental currency in web measurement—the top of the pyramid, as shown in Figure 1-5. The idea of the "known visitor" is important to web measurement because, given the right message, any good salesperson can sell; the sales process breaks down when forced to sell to anonymous groups. Think about the difference between a really good car salesperson and a car commercial on TV—the commercial may sway you one way or the other, but it's the salesperson speaking to you directly who will seal the deal. Because of the universal desire to "know" the visitor, it's important to understand the three major visitor categories: totally anonymous, mostly anonymous, and known (Figure 1-5).
Figure 1-5: The three types of unique visitors
Some of the visitors who come to your web site will be truly and potentially forever anonymous; there's very little you can truly know about them except where they came from (their referring URLs) and which pages they view during their visits. This small but persistent group may be as many as 15% of Internet users who disable all cookies, surf through proxies that hide their IP addresses, or otherwise work to obfuscate their identities. While there are alternatives to using cookies [Hack #17] to determine the relative uniqueness of a visitor, anyone with motivation, desire, and a basic understanding of how Internet browsers work can prevent you from knowing much about him at all. The fact that this group exists is reason enough to exclusively use first party cookies in your analysis [Hack #16] .
The bulk of people who visit your site are mostly anonymous: people who don't go out of their way to hide from you but also don't offer up any truly useful personal information. Most of your Internet audience will accept cookies
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Understand Common Data Sources
Before you get started analyzing, determine where the data will be coming from.
In the early days of web site measurement, there was only a single source of data, web server logfiles [Hack #22] . Generated automatically by web servers like Apache and Microsoft Internet Information Server, these flat text files were simply a report of which IP addresses were requesting which objects. At one point, a smart and enterprising soul realized that these files could be parsed and that the results could tell you roughly what people were doing on your web site. Some say, "In the beginning there were logfiles and they were good.…" Unfortunately, web server logfiles weren't good enough.
People began to see problems creeping into their log-based analysis—missing information and requests that could in no way be coming from a human being—and gradually programmers realized that something better would be needed. Problems arose from the emergence of forward caching devices, the addition of page caching in the browsers, and the explosion of nonhuman user agents attempting to catalog the rapidly expanding Internet. As more and more people came online, the caching devices were needed to improve the overall browsing experience, and while they helped the average surfer connecting with a 28k modem, they dramatically impacted the accuracy of log-parsing applications.
Packet sniffing, also referred to as the network data collection model [Hack #8] , basically puts a listening device on a major node in the web delivery architecture, as shown in Figure 1-6. The listening device would then passively log requests for resources, essentially sitting in front of your web server farm. While sniffing provided some advantages—centralized data collection, more details about failed or cancelled requests, and improved accuracy in server overload conditions—few applications ever supported the model and it never really took off.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Understand Visitor Intent
Despite the great body of knowledge that web measurement applications help you collect and organize, the intent of visitors when they come to your web site is nearly always part of the "great unknown" online. By recognizing this, you can often improve your overall understanding of the metrics, improving their value and use.
If you ask any online retailer why people come to their web sites, they usually answer, "Well, to buy things of course!" While it sounds like a great answer, it is usually not true; if it were, retailers' buyer conversion rates would be much higher than the three percent widely reported. Most people who visit online retail sites are simply browsing or doing research and have no intention of making an online purchase. The problem is that order and buyer conversion rates [Hack #39] are built from the assumption that every visitor is a potential purchaser, a fallacious assumption if there ever was one.
If retailers could build calculations based not on the entire audience, but only on the people who actually intended to purchase, conversion rates would likely shoot through the roof. By eliminating the tire kickers from the equation, business owners could focus on resolving problems experienced by visitors who actually had potential, not just promise.
So if you could figure out the visitor's intent, your measurement problems would be solved.
Indeed.
Easier said than done, but there are two general strategies for determining visitor intent: explicitly and implicitly.
Determining intent explicitly is actually pretty easy if you think about it—you ask. Simply pop up a window when visitors arrive at your site and ask them in as polite a way possible "Why are you here today?" Present them with a reasonable list of options (for example, "To make a purchase," "To research your products and services," and "To get customer support") and pass their response to your measurement application.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Know When to Use Packet Sniffing
Network data collectors, or "packet sniffers," create an alternative data source that has a handful of benefits, provided that you maintain their upkeep.
Users respond not only to a site's content, but also to its delivery, which include factors such as speed, quality, and reliability. Together, content and delivery influence what users choose to view, how long they view it, how they navigate through the site, and ultimately whether they will return. All the compelling content in the world won't save a web site that can't deliver it well.
Many options exist to get information about what content was served. But to get the delivery information, in order to get a complete picture of user behavior, you can turn to collecting data at the network level. This is commonly referred to as network collection (or using a sniffer, but Sniffer® is a registered trademark of Network General Corporation to describe its line of protocol analyzers, so we won't use that term here).
Because of the design of the network layers in a computer system, the low-level details about the network packets are unavailable to the web server—this is a good thing, as it allows the web server to concentrate on serving web content. However, by the time the server sees the transaction, much of the underlying performance data is lost, or has been modified into something less useful.
For instance, a web server can log when it sent some content, but cannot know if the client actually got it, or if the client didn't get it, how much it got before the client stopped the transaction. Using collection methods such as page tagging, you can capture more granular information about page deliveries, but cannot determine why a transaction was slow or failed. In general, application-level loggers (web logs, page tagging, server plug-ins, etc.) cannot report:
  • Client-initiated disconnects (for example, users hitting the stop button in mid-download)
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Write a Useful Web Measurement Request for Proposal (RFP)
Business people love to create complex RFPs when they're making a purchase decision. Here's why this is a bad idea and what to do about it.
Requests for proposal (RFPs) are a very popular and inefficient way to select a web measurement vendor. Yes, inefficient. While you may think that creating a list of your requirements and then asking vendors how well their software satisfies this list would be a great way to get started, it's not. Here's why.
First, I have never in my life come across a company that could assemble a list of requirements in an efficient manner. Usually some poor soul is forced to walk around asking people what their needs are, trying to keep track of whether the same needs have already been mentioned, and compiling a ridiculous list that then needs to be refined.
Second, nobody ever seems to come up with a practical list. RFPs usually read like "we need everything that is currently available, everything on every vendor's development roadmap, and a handful of features not likely to be available until humanity masters cold fusion and builds a bridge to the moon."
Third, no matter how impractical the list, every vendor you send it to will be able to satisfy every requirement better and cheaper than their competition. How is this possible? It's not, but do you really expect sales people to tell the truth in a document where if they do, they will most likely be disqualified?
I didn't think so.
So what can you do if creating RPFs is a complete waste of time, energy, and resources and you'll still end up with the same answer you'd likely get if you just read any good analyst report? One thing you can do is write a better RFP! Here are four simple things you can do to write a better request for proposal, if you really feel you must.
A common mistake companies make when writing RFPs is trying to figure out what their needs will be well into the future and adding those requirements to their list. While it's a good idea to think about the near-term future, it's a much better strategy to focus on the problems that you can solve immediately (or nearly so). While this sounds counterintuitive, experience tells us that most companies that purchase technology to support future measurement needs often pay for functionality they never use. If you focus on the technology you're confident you'll use in the next 12 months, you'll get more short-term wins that will leave everyone feeling better about your decision.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Find a Free or Cheap Web Measurement Solution
Free or inexpensive packaged solutions are a great place to get your feet wet with web site measurement.
It's easy to get the impression that web measurement is a costly and time consuming endeavor, one requiring deep pockets and significant expertise.
While time and effort are absolutely required to truly understand your web site visitors, there are also many inexpensive and even free software is available to support your efforts. In fact, the very first web measurement applications, Webstats and Analog, are freely available and nominally supported by the open source movement to this day.
An important consideration regarding free and inexpensive measurement solutions is the old adage you get what you pay for. While this does not mean that low-cost solutions are necessarily bad, it's more a reflection that open source and entry-level web measurement solutions are often less well maintained, documented, or supported than their business-class counterparts (at least at the time of this writing). If you go this route, you need to simply be aware of this and be prepared to have to look a little harder or wait a little longer for help when problems arise (and trust me, they always arise!).
The following table lists several free measurement solutions, where to get them, which data sources they use, how the applications are delivered, and what makes them worth mentioning in this hack (Table 1-3).
Table 1-3: Freely available web measurement solutions
Name and URL
Data source(s) used Delivery type platforms
What's good or cool about this solution
Analog
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Use Analog to Process Logfiles
Analog is purportedly the most popular web server logfile analyzer in the world. If you're just getting your feet wet in web measurement, you might want to give it a quick try.
Analog, written by Dr. Stephen Turner (one of the original web measurement hackers), is an easy-to-install and highly flexible web server log analyze—one that hundreds of thousands of people have likely tried at one time or another just to get a taste of web measurement. Since it's completely free, you might want to take the time to download and install the application and generate a few reports just for fun.
Analog is made freely available by Dr. Turner at www.analog.cx. At the time this book was written, versions of the application were available for Windows; Macintosh OS 8, OS 9, and OS X; dozens of flavors of Unix, including BSD, Linux, HP-UX, and Solaris; and a motley collection of non-Unix platforms like BeOS, Novell Netware, and OpenVMS. You can also download the source code to compile on any known platform, if it suits your fancy.
Visit http://www.analog.cx/download.html and select the version of Analog that works best for you.
Once you've downloaded the application, on most platforms all you need to do is extract it using whichever application you normally use to extract archives. The archive will uninstall into an install directory, usually analog version], where [version] is the version of Analog you've downloaded. For example, if you downloaded Analog Version 6.0, you're going to end up with a directory called analog 6.0.
To test your installation, simply navigate to the directory created and run Analog in the normal way for your operating system. For example, on Linux, you would just type ./analog, and on Windows, you would click on the executable analog.exe. In Windows, a DOS window will open and close, but a report file called Report.html should be created in the analog directory—one that when opened will present you with a sample report. If something went wrong, look for the
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Build Your Own Web Measurement Application: An Overview and Data Collection
If you've got passable Perl skills and the desire to control your own destiny, you can use our code and build a simple page-tag analyzer.
The first hack in our "Build Your Own Web Measurement Application" series describes how the data will be collected. We'll be using a JavaScript page tag [Hack #28] and, to the best of our knowledge, ours is the only freely available tag-based reporting application available today.
Figure 1-12: Analog reports modified using Report Magic
There are two components in our data collection strategy. The first is a piece of JavaScript code that must be inserted into every page on your web site. When the visitor's web browser renders the page, the script is executed, causing a request for an image to be made to the web server. For now, the image URL contains basic information about the page and the referrer, although we shall see how to augment it in [Hack #90] .
The second component is a program that runs on the server. It writes the page and referrer information into a web server logfile, and then returns the image the browser is waiting for, which is an invisible one-pixel transparent image.
The logfile we build will look something like this:
	1104772080 192.168.17.32 /index.html?from=google http://www.google.com/
	search?q=widgets 192.168.17.32.85261104772101338
	1104772091 192.168.17.32 /products.html http://www.example.com/index.
	html?from=google 192.168.17.32.85261104772101338
The first field on each line is the time of the request in Unix time (seconds since 1/1/1970). The second field is the client's IP address, which the server knows. The third is the URL of the page; the fourth is the URL of the referring page (the page linked to this one); and the fifth is the visitor's cookie (in this case, generated by Apache's mod_usertrack
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Build Your Own RSS Tracking Application: An Overview and Data Collection
Content syndication via RSS and XML and blogging are extremely hot topics, but there are few tools available to track people reading and interacting with your content and articles. With a little bit of Perl knowledge, you can use our "build your own" hack to write a bare-bones RSS traffic analyzer.
If you're willing to roll up your sleeves a bit and dig into some Perl, you can significantly enhance your ability to track syndicated content compared to the little you're likely able to learn using only web measurement tools [Hack #47] . Using the following scripts to track your own RSS feeds and posts will tell you:
  • What articles and posts people read
  • Who refers people to your work
  • Where readers click out to from your posts (which links are clicked)
For syndicated content, this is pretty much it: the information you need to determine the reach and response to your blogging activities. While it depends on a little bit more code—and it won't work on every blogging platform or every RSS reader because there is really no better source for this data—the results are very satisfying.
The code for this hack is relatively simple and broken into four parts:
  • The code that goes into each RSS feed or article you want to track
  • The code that the RSS feed will call (track_rss.js)
  • The code that will process the resulting request, generated by the first two blocks of code (write_rss_tag.cgi) and generate a log of your RSS activity (rss.log)
This code functions in nearly the same way as a client-side page tag [Hack #28] by leveraging a "round trip" call to an external JavaScript file.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Chapter 2: Implementation and Setup
A significant amount of investment on both your and your vendor's part will go into implementation and setup. Why? Because if you screw up your implementation, the chances that you'll do or learn anything meaningful from your analytics application decrease significantly.
The largest numbers of hacks presented in this book are implementation and setup hacks, because this is where the greatest opportunity really is. The more attention you pay, the more useful data you collect, the better the "hack" in your implementation, and the better your overall experience with web measurement will be.
This is when the really hard work begins. Crazy, huh? You thought sitting through all those demonstrations and negotiating contacts was the hard part. Unfortunately, for the most part, the process you've been through will seem like a piece of cake compared to implementation and training. And, more unfortunately, implementation will seem like a piece of cake compared to getting people to actually use and respond to the reports.
But don't despair!
The hacks in this chapter are written to help you do a great job with your implementation process. My recommendation is not to just read the hacks that seem relevant to you, but rather to read them all. You'll never know where that fantastic piece of trivia will show up, the one piece of information you need to better explain to your vendor what you're trying to do. Remember, knowing is half the battle. Use that knowledge to your advantage.
Just in case your implementation is not going as planned, don't panic. Keep in mind that despite years of investment in web measurement and analysis this is still pretty complicated stuff, and you'll inevitably make mistakes. If you find yourself becoming frustrated with the setup or implementation process, here are a few useful questions you should ask yourself before coming unglued:
Whose fault is it really?
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Hacks 14–36: Introduction
A significant amount of investment on both your and your vendor's part will go into implementation and setup. Why? Because if you screw up your implementation, the chances that you'll do or learn anything meaningful from your analytics application decrease significantly.
The largest numbers of hacks presented in this book are implementation and setup hacks, because this is where the greatest opportunity really is. The more attention you pay, the more useful data you collect, the better the "hack" in your implementation, and the better your overall experience with web measurement will be.
This is when the really hard work begins. Crazy, huh? You thought sitting through all those demonstrations and negotiating contacts was the hard part. Unfortunately, for the most part, the process you've been through will seem like a piece of cake compared to implementation and training. And, more unfortunately, implementation will seem like a piece of cake compared to getting people to actually use and respond to the reports.
But don't despair!
The hacks in this chapter are written to help you do a great job with your implementation process. My recommendation is not to just read the hacks that seem relevant to you, but rather to read them all. You'll never know where that fantastic piece of trivia will show up, the one piece of information you need to better explain to your vendor what you're trying to do. Remember, knowing is half the battle. Use that knowledge to your advantage.
Just in case your implementation is not going as planned, don't panic. Keep in mind that despite years of investment in web measurement and analysis this is still pretty complicated stuff, and you'll inevitably make mistakes. If you find yourself becoming frustrated with the setup or implementation process, here are a few useful questions you should ask yourself before coming unglued:
Whose fault is it really?
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Optimize the Implementation Process
Implementation is over half the battle in web measurement, so you want to make sure you do it right.
Once you've selected a vendor [Hack #3] , the next step is to get the application up and running and begin collecting data. Regardless of whether you've gone the software or hosted service route, or whether you're using JavaScript page tags or web server logfiles as a data source, taking time to optimize the implementation process can prevent real headaches later on.
Given the complexity of data collection afforded by top measurement applications, knowing what you want to collect before you start is critical. Especially when working with a hosted service provider [Hack #3] or using page tags [Hack #28] , having clear expectations about which data will be collected [Hack #19] can save you time and prevent having to explain why you can't generate critical reports. It's also worthwhile at this point to compare the list of data you'll collect to your original request for proposal (RFP) [Hack #9] to double-check that you're getting what you need. If you don't spend the time getting this right, you'll regret it later.
Unless you're working with the most bare-bones of applications [Hack #10] or building software yourself [Hack #12] , I strongly recommend that you plan on spending some time with implementation support staff from your vendor of choice, especially early on. While you may have a clear vision of how you want your data collected and reports generated, many vendors do a poor job documenting their products, making it difficult to rely on the do-it-yourself attitude that so many technologists have.
Most vendors provide at least nominal implementation support for free; if you're spending more than $50, 000 on software or data collection, you should seriously consider purchasing (or better, negotiating at no additional cost) at least one full day of implementation support (reported vendor pricing is listed in Table 2-1).
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Improve Data Accuracy with Cookies
Cookies are a fundamental component in any web measurement solution and they come in several flavors. Because of the explosion in use of anti-spyware applications, you need to understand how cookies are commonly used and make an active decision about how they'll be used on your site.
In theory, one of the simplest ways to improve the accuracy of your analytics data is to use cookies as a data tracking mechanism. A cookie is a piece of information that is stored by your web browser and comes in two minor variations: session cookies and persistent cookies. Session cookies last only as long as the visitor is on your site and are deleted after the user closes her web browser or after some period of inactivity (typically 30 minutes [Hack #1] ). Persistent cookies last beyond a single visit and have an expiration date some time in the future. Session and persistent cookies use identical technology but differ in how they're treated by security and privacy applications like the Platform for Privacy Preferences (P3P) [Hack #26] .
Session cookies are typically set by web server applications and allow your analytics solution to group interactions with your web server at the visit level. With logfile-based solutions, you should enable your web server to set session cookies and configure your analytics solution to track these session cookies in your logs. Tag-based solutions [Hack #3] will typically set their own session cookies, so you should get this functionality for free. Once session cookie tracking is enabled, you can start to analyze a number of useful visit level statistics, including total visits, pages per session, entry pages, exit pages, and clickstream data.
Persistent cookies allow your analytics solution to track visitor behavior across multiple visits, which is absolutely critical in web site measurement. This is useful when you are trying to understand customer retention information, such as repeat visit and purchase activity, or understand the frequency of visit and lifetime value
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Know When to Use First-Party Cookies
As consumers become more sensitive to potential invasions of their privacy, many are moving to limit your access to information about them via the use of cookies. Here's how to know when to use first-party cookies and what effect their use will have on your analysis.
There are two kinds of persistent cookies used in web measurement: first and third party [Hack #15] . The answer to the question "When should I use a first-party cookie?" is basically "Whenever possible." In this era of increasing awareness about security and privacy, it is preferable to use first-party cookies over third-party cookies, period. It seems like rarely a week passes when we're not hearing about some other privacy intrusion or black hat hack; you and I may know that these intrusions rarely have anything to do with cookies, but the majority of Internet users have no clue.
Given the popularity of anti-spyware applications and the simplicity with which third-party cookies can now be removed or blocked, it is no wonder that research is beginning to show that cross-visit accuracy for these types of cookies is slipping as low as 70 percent. Sure, 70 percent is a lot, but wouldn't you prefer 100 percent accuracy from your web measurement solution?
Cookies are becoming increasingly easy to control, thanks to functionality built into the most popular browsers. Firefox has very simple tools for controlling what cookies are set, from where, and by whom (Figure 2-1) and Microsoft has provided strong controls for cookies via their implementation of the Platform for Privacy Preferences (P3P) [Hack #27] . As more and more Internet users learn about these kinds of tools, can the end of third-party cookies be far behind?
First-party cookies have many advantages. They are not subject to tightening default security settings in many of today's web browsers. They are also not likely to be deleted by anti-spyware and anti-adware programs, which go through your browser's cookies to delete any they deem to be spyware. Consequently, the accuracy of your web measurement data will be much higher if your analytics solution uses first-party cookies to track the majority of your web site activity.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Alternatives to Cookies
While cookies are the most widely used means to identify unique visitors, they are by no means perfect, and a handful of worthy alternatives exist.
Cookies, when properly used, provide a great service to web measurement applications, allowing unique visitors to be tracked from visit to visit and enabling valuable measurements like frequency of visit and lifetime value. Sadly, web site analysts must sometimes do without cookies. There are many reasons for that, such as the following:
  • Some visitors actively and consciously disable cookies in their web browsers due to concerns about their privacy. They are essentially opting out of being measured and tracked.
  • Some visitors may allow regular cookies set by your web site but disallow tracking cookies set simultaneously by third-party web sites.
  • Some visitors may browse your web site from handheld devices that are not always capable of keeping cookies.
  • Some web sites make a conscious decision not to cookie their visitors as a symbol of their respect for their clients' privacy. Most typically, this may be the case with websites of banks and other financial services.
Even if your web site attempts to set cookies, there will be a portion of your visitors for whom cookies are unavailable. Web server logfile-based site measurement tools typically uncover that 15 percent of visits to a web site do not carry a cookie. This hack will show you some alternatives.
The more sophisticated the web site measurement application, generally the more alternatives to cookies it will offer. Some vendors allow their customers to choose alternatives to cookies to enable the determination of a unique visitor and visit (sometimes referred to as "sessionization").
The following are less accurate but still often useful strategies for determining the uniqueness of a visitor.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Use Macromedia Flash Local Shared Objects Instead of Cookies
Leverage the ubiquity of Macromedia's Flash and Local Shared Objects instead of cookies.
Recent data presented by JupiterResearch suggests that the availability of cookies for use in measurement applications is at greater risk than many previously believed. One response to the "decline of cookies" is to look for other systems for tracking new and returning visitors. This hack describes a workaround based on Macromedia Flash's Local Shared Object. According to the fine folks at Macromedia:
Shared Objects are used to store data on the client machine in much the same way that data is stored in a cookie created through a web browser. The data can only be read by movies originating from the same domain that created the Shared Object. This is the only way Macromedia Flash Player can write data to a user's machine. Shared Objects can not remember a user's e-mail address or other personal information unless they willingly provide such information.
The important pieces of this definition are "much in the same way that data is stored in a cookie, " "can only be read by movies originating from the same domain that created the Shared Object, " and "Shared Objects can not remember a user's e-mail address or other personal information unless they willingly provide such information." Put another way, Local Shared Objects are a perfect replacement for cookies because they're just as secure and just as harmless.
The following script tests for when the Flash movie should be embedded in the page and provides a function for setting the secondary cookie. There are three main configuration parameters:
  • myUIDCookie is the name of the unique ID cookie employed on your site. The default for Apache's mod_usertrack module is "Apache."
  • myUIDFlashCookie is the name you wish to be used for the secondary cookie created by this system.
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Fine-Tune Your Data Collection
One of the most important steps during implementation is fine-tuning your data collection to suit your specific needs.
One of the things that web measurement in no way lacks is available data— there are hundreds of primary reports that can be generated and thousands of secondary reports available when you begin to drill down and cross-tab within the data. While some paint the plethora of data as "good news, " the converse is often true: there is definitely such a thing as "too much information" in web measurement.
This is one reason key performance indicators [Hack #94] are such a valuable management tool: they help simplify data presentation and dissemination. After you have carefully considered your data needs before you set everything up [Hack #14] , the next step is to fine-tune the data you collect so that you can make effective use of the KPI framework.
From a technical standpoint, the decisions you make about data collection are driven by your choice between using web server logfiles and JavaScript page tags. The sections below describe some techniques for eliminating some of the clutter in your data for each technique.
One of the first steps in reducing clutter is to log only data that you might like to eventually analyze. In the web measurement world, a web server logfile [Hack #22] refers to a combination of as many as four individual files: error logs, access logs, referrer logs, and agent logs. Fortunately, the combined and extended log formats used by Apache, Internet Information Server, and other popular web servers remove the need to process four separate files by combining useful elements into a single entry in the access log (often called the NCSA Extended or "combined" log format).
The combined logfile looks something like this (from http://httpd.apache.org/docs/logs.html#combined):
	127.0.0.1 -frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 
	2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Define Useful Page Names and Content Groups
Make sure that everyone in your organization can decipher your page and content group names.
One aspect of implementations that is often overlooked is the importance of establishing meaningful page names and content groupings. Fight the temptation to take shortcuts during implementation, and instead strive to define useful and human-readable names for your web pages. For example, rather than allowing the overworked implementation team to create incomprehensible page names like pv_133221, invest the few extra seconds it takes to make a more meaningful name like Product View: Product ID 133221. Translating developer-speak into human-readable names dramatically increases the likelihood that non-techies will be able to make use of the information.
If you're using a web measurement solution based on a JavaScript page tag [Hack #28] , make sure you actually set a page name programmatically instead of using the document <TITLE> or script name (for example, index.asp) and always make sure you follow any directions your vendor provides regarding the script, such as converting spaces and removing illegal characters. If your data source is a web server logfile, you start at a disadvantage; generating useful page names usually requires some type of translation table [Hack #22] . Some examples of good and bad page names include:
index.html
BAD. This default filename provides little or no insight into what content is presented to the visitor. Even when this page is reported in context of the document location (/products/productA/details/index.html), it is only nominally better.
index.asp?skuid=45552cb122
BAD. This default filename, again, even in context, because of its dependence on the information contained in the query string (
Additional content appearing in this section has been removed.
Purchase this book now or read it online at Safari to get the whole thing!
Understand Where Data Gets Lost
Content preview·Buy PDF of this chapter|