August 2019
Intermediate to advanced
342 pages
9h 35m
English
Managing the different orders of magnitude of the counts is a problem that occurs in different situations, and there are many algorithms that behave badly when faced with data that exhibits a wide ranges of values, such as clustering algorithms that measure similarity on the basis of Euclidean distance.
In a similar way to binarization, it is possible to reduce the dimensional scale by grouping the raw data counts into containers called bins, with fixed amplitude (fixed-with binning), sorted in ascending order, thereby scaling their absolute values linearly or exponentially.
Read now
Unlock full access