PCA is a widespread technique for dimensionality reduction, yet when we deal with large data, presenting many features, we first need to understand what's going on in the feature space. In fact, in the EDA phase, you'll usually make several scatterplots of the data to understand what the relationship between features is. At this point, T-distributed stochastic neighbor embedding, or T-SNE, comes to your aid since it has been designed with the goal of embedding high-dimensional data in a 2-D or 3-D space to make the most of a scatterplot. It is a nonlinear dimensionality reduction technique developed by Laurens van der Maaten and Geoffrey Hinton and the core of the algorithm is based on two rules: the first is that recurrent similar ...

Get Python Data Science Essentials - Third Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.