April 2018
Beginner to intermediate
282 pages
6h 52m
English
Principal Component Analysis (PCA) transforms the data in the high-dimensional space to a space of fewer dimensions. Let's consider visualization of a 100-dimensional dataset. It is barely possible to efficiently show the shape of such high-dimensional data distribution. PCA provides an efficient way to reduce the dimensionality by forming various principal components that explain the variability of the data in a reduced dimensional space.
Mathematically, given a set of variables, X1, X2,...., Xp, where there are p original variables. In PCA we are looking for a set of new variables, Z1, Z2,....,Zp, that are weighted averages of the original variables (after subtracting their mean):