T-SNE

PCA is a widespread technique for dimensionality reduction, yet when we deal with large data, presenting many features, we first need to understand what's going on in the feature space. In fact, in the EDA phase, you'll usually make several scatterplots of the data to understand what the relationship between features is. At this point, T-distributed stochastic neighbor embedding, or T-SNE, comes to your aid since it has been designed with the goal of embedding high-dimensional data in a 2-D or 3-D space to make the most of a scatterplot. It is a nonlinear dimensionality reduction technique developed by Laurens van der Maaten and Geoffrey Hinton and the core of the algorithm is based on two rules: the first is that recurrent similar ...

Get Python Data Science Essentials - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.