Metrics and the Statistics behind A/B Testing

By Bryan Gumm

As online marketers and product managers, we can choose to optimize our users' experience along several different metrics. For example, a product manager of a subscription service might be interested in optimizing retention rate (percent), and an online marketer of an e-commerce site might focus on optimizing average order value ($). While each of these is obviously valid, the statistics behind A/B testing are slightly different for each. Before delving into the nuances of each, we'll introduce a few core concepts.

Confidence Intervals

Suppose we know that 51.4 percent of the population of the City of San Francisco has a bachelor's degree or higher. If we were to choose 1,000 city residents at random, we'd expect that exactly 514 of those people would have a bachelor's degree or higher. In reality, of course, this rarely happens. Why not? First, depending on your sample size, it may not be mathematically possible to arrive at exactly 51.4 percent (try this example with a sample size of 100 instead of 1,000). Second (and more important), by using a small sample to represent a large population, we are introducing some error.

In reality, it's usually difficult or impossible to measure the exact value of a statistic for an entire population; hence the obvious value of sampling. It seems, then, that we need a way to quantify the reliability of our sample data. We do this using estimates.

When we talk about statistics ...

Get A/B Testing: The Most Powerful Way to Turn Clicks Into Customers now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.