Diversified Sampling: Mining Large Datasets for Special Cases — TL;DR: the objective is to build a small sample of the data in which special cases are likely to be represented. The strategy is to have each data item emit a list of “features,” and to boost the probability of selecting rare features. A shallow understanding of the data is often enough to design an effective features extraction.
Learn faster. Dig deeper. See farther.
Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.