O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

To sample or not to sample?

Sampling is specified as step 1 of the SEMMA process (but not specific to CRISP-DM), so I will cover this separately.

Traditionally, predictive analytics have started with sampling. Sampling is particularly important in certain industries (such as pharmaceutical and healthcare), which begin with experimental studies. Sampling is also important in studies which you follow groups of people over a long period of time (cohorts). However, other kinds of data projects are not research type projects, and they are more machine learning oriented. Given that, I hold the belief that many algorithms are easier to work with (and are more powerful) if the data follows certain statistical properties, such as transforming raw ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required