O'Reilly logo

Commercial Data Mining by David Nettleton

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 7

Data Sampling and Partitioning

Abstract

This chapter discusses various types of sampling such as random sampling and sampling based on business criteria (age of customer, time as client, etc.). It also discusses extracting train and test datasets for specific business objectives and considers the issue of Big Data, given that it is currently a hot topic.

Keywords

sampling

data reduction

partitioning

business criteria

train

test

Big Data

Introduction

Sampling is a method for selecting a subset of data from the complete dataset in order to analyze and create models, and where the subset is sufficiently representative of the whole data set. This is important when the total data volume is very high: for example, if a bank has five million clients, ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required