Skip to Content
Applied Machine Learning and AI for Engineers
book

Applied Machine Learning and AI for Engineers

by Jeff Prosise
November 2022
Intermediate to advanced
425 pages
11h 25m
English
O'Reilly Media, Inc.
Content preview from Applied Machine Learning and AI for Engineers

Chapter 6. Principal Component Analysis

Principal component analysis, or PCA, is one of the minor miracles of machine learning. It’s a dimensionality reduction technique that reduces the number of dimensions in a dataset without sacrificing a commensurate amount of information. While that might seem underwhelming on the face of it, it has profound implications for engineers and software developers working to build predictive models from their data.

What if I told you that you could take a dataset with 1,000 columns, use PCA to reduce it to 100 columns, and retain 90% or more of the information in the original dataset? That’s relatively common, believe it or not. And it lends itself to a variety of practical uses, including:

  • Reducing high-dimensional data to two or three dimensions so that it can be plotted and explored

  • Reducing the number of dimensions in a dataset and then restoring the original number of dimensions, which finds application in anomaly detection and noise filtering

  • Anonymizing datasets so that they can be shared with others without revealing the nature or meaning of the data

And that’s not all. A side effect of applying PCA to a dataset is that less important features—columns of data that have less relevance to the outcome of a predictive model—are removed, while dependencies between columns is eliminated. And in datasets with a low ratio of samples (rows) to features (columns), PCA can be used to increase that ratio. As a rule of thumb, you typically ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Machine Learning Engineering in Action

Machine Learning Engineering in Action

Ben Wilson
Machine Learning for High-Risk Applications

Machine Learning for High-Risk Applications

Patrick Hall, James Curtis, Parul Pandey
Architecting Data and Machine Learning Platforms

Architecting Data and Machine Learning Platforms

Marco Tranquillin, Valliappa Lakshmanan, Firat Tekiner

Publisher Resources

ISBN: 9781492098041Errata Page