5. Hypothesis Testing

5.1 Introduction

Now that you know how to encode data, you can start making some use of it! A common task is to say statistics from different samples are different from one another. For example, you might have data from an AB test, and you’d like to say the average number of articles shared by users in the test group is higher than the number of pages shared by users in the control group. How can you be sure the difference isn’t due to random error from sampling? How can you be sure there’s really a difference? These questions fall in the domain of hypothesis testing.

In this chapter, we’ll address a few related questions. We’ll often frame questions in the context of AB testing because hypothesis testing is often used ...

Get Machine Learning in Production: Developing and Optimizing Data Science Workflows and Applications, First Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.