Reservoir sampling

Given n items, where n is very large or not known in advance, the reservoir sampling algorithm helps us select k samples from n items. The algorithm to begin with initializes an array of size k. It then copies the first k items from the stream into the array. Now, it proceeds to evaluate each item in the array. A random number, j, between 0 and i, is generated, where i is the index of the item we are currently evaluating. If, j is in range of 0 to k-1, we replace the j element in the array with i th element in the stream.

For more information about reservoir sampling, refer to: https://en.wikipedia.org/wiki/Reservoir_sampling

The parameter k, to DSC_Sample, is used to set the k value for the reservoir sampling, and eventually ...

Get R Data Analysis Projects now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.