Using Random Numbers to Knock Your Big Data Analytic Problems Down to Size


Random number generators (technically, pseudorandom number generators) can play a useful role in every Big Data analysis project. All of the popular programming languages have the ability to produce pseudorandom numbers, and these numbers can be used to randomly sample large sets of data, in a variety of creative ways. The purpose of this chapter is to demonstrate how Monte Carlo simulations and resampling methods can be applied to Big Data, to solve commonly encountered analytic problems (without using advanced statistics).


Pseudorandom number generators; Bayesian analysis; Resampling; Permutating; Monte Carlo simulations

Get Principles and Practice of Big Data, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.