Visualizing High-Dimensional Data with Python
Learn how to use dimensionality reduction to better understand your data
Understanding your data is key in any data science project. Visualization is useful but can be challenging when the data has a high dimensionality. This includes complex data types such as text, images, and sensor measurements from fields such as industry, healthcare, and transportation. You could create a scatter plot matrix, but this can only show how any two features interact and fails to capture structure across many dimensions. But not to worry—there’s an entire subfield within machine learning concerned with exactly this challenge: dimensionality reduction. Dimensionality reduction algorithms can help you gain insight into your high-dimensional data and reveal whether there’s any structure.
Expert Jeroen Janssens walks you through three well-known dimensionality reduction algorithms: principal component analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP). Join in to learn when and why to use dimensionality reduction, the benefits and limitations of the various algorithms, how they work under the hood, and how to apply them using Python and the Jupyter Notebook.
What you'll learn-and how you can apply it
By the end of this live online course, you’ll understand:
- The importance of visualizing high-dimensional data
- The benefits and limitations of various dimensionality reduction algorithms, including principal component analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP)
- The inner workings of the algorithms on a more detailed level
And you’ll be able to:
- Apply dimensionality reduction algorithms in Python and the Jupyter Notebook
- Choose the right parameter settings given an algorithm and dataset
- Visualize the resulting mapping using your favourite plotting package, whether that's Matplotlib, Altair, seaborn, or plotnine
This training course is for you because...
- You’re a data scientist, BI specialist, statistician, or machine learning engineer who works with complex data.
- You want to understand how dimensionality reduction works and how it can help you.
- You aim to reveal any structure in your data through visualization.
- A working knowledge of Python
- Familiarity with the scikit-learn API (useful but not required)
- Read Feature Engineering for Machine Learning (book)
- Read Interactive Data Visualization for the Web, second edition (book)
About your instructor
Jeroen Janssens is the founder and CEO of Data Science Workshops, which provides on-the-job training and coaching in data visualisation, machine learning, and programming. For one day a week, Jeroen is an assistant professor at Jheronimus Academy of Data Science. Previously, he was a data scientist at Elsevier in Amsterdam and startups YPlan and Outbrain in New York City. He is the author of Data Science at the Command Line, published by O’Reilly Media. Jeroen holds a PhD in machine learning from Tilburg University and an MSc in artificial intelligence from Maastricht University.
The timeframes are only estimates and may vary according to how the class is progressing
The importance of dimensionality reduction (20 minutes)
- Group discussion: What kind of data do you work with? How do you currently visualize it?
- Presentation: What is dimensionality reduction?; Why do it?; overview of dimensionality reduction algorithms
- Demo: The disadvantage of a scatter plot matrix
Algorithm: PCA (35 minutes)
- Presentation: An intuitive understanding of PCA (principal component analysis)
- Demo: Visualizing results using Matplotlib, seaborn, Altair, or plotnine
- Jupyter Notebook exercise: Apply PCA and visualize results
Break (5 minutes)
Algorithm: t-SNE (55 minutes)
- Presentation: A deep dive into t-SNE (t-Distributed Stochastic Neighbor Embedding)
- Jupyter Notebook exercise: Explore the influence of the parameter perplexity
Break (5 minutes)
Algorithm: UMAP (40 minutes)
- Presentation: The difference between t-SNE and UMAP (Uniform Manifold Approximation and Projection)
- Jupyter Notebook exercise: Apply UMAP and compare results with t-SNE
Wrap-up and Q&A (20 minutes)