Data Sampling and Partitioning
Abstract
This chapter discusses various types of sampling such as random sampling and sampling based on business criteria (age of customer, time as client, etc.). It also discusses extracting train and test datasets for specific business objectives and considers the issue of Big Data, given that it is currently a hot topic.
Keywords
sampling
data reduction
partitioning
business criteria
train
test
Big Data
Introduction
Sampling is a method for selecting a subset of data from the complete dataset in order to analyze and create models, and where the subset is sufficiently representative of the whole data set. This is important when the total data volume is very high: for example, if a bank has five million clients, ...
Get Commercial Data Mining now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.