book

Hands-On Machine Learning with Scikit-Learn and PyTorch

by Aurélien Géron

October 2025

Intermediate to advanced

878 pages

26h 47m

English

O'Reilly Media, Inc.

Book available

Read now

Unlock full access

Machine Learning in Your ProjectsObjective and ApproachCode ExamplesPrerequisitesRoadmapChanges Between the TensorFlow and PyTorch VersionsOther ResourcesConventions Used in This BookO’Reilly Online LearningHow to Contact UsAcknowledgments
What Is Machine Learning?Why Use Machine Learning?Examples of ApplicationsTypes of Machine Learning SystemsTraining SupervisionBatch Versus Online LearningInstance-Based Versus Model-Based LearningMain Challenges of Machine LearningInsufficient Quantity of Training DataNonrepresentative Training DataPoor-Quality DataIrrelevant FeaturesOverfitting the Training DataUnderfitting the Training DataDeployment IssuesStepping BackTesting and ValidatingHyperparameter Tuning and Model SelectionData MismatchExercises
Working with Real DataLook at the Big PictureFrame the ProblemSelect a Performance MeasureCheck the AssumptionsGet the DataRunning the Code Examples Using Google ColabSaving Your Code Changes and Your DataThe Power and Danger of InteractivityBook Code Versus Notebook CodeDownload the DataTake a Quick Look at the Data StructureCreate a Test SetExplore and Visualize the Data to Gain InsightsVisualizing Geographical DataLook for CorrelationsExperiment with Attribute CombinationsPrepare the Data for Machine Learning AlgorithmsClean the DataHandling Text and Categorical AttributesFeature Scaling and TransformationCustom TransformersTransformation PipelinesSelect and Train a ModelTrain and Evaluate on the Training SetBetter Evaluation Using Cross-ValidationFine-Tune Your ModelGrid SearchRandomized SearchEnsemble MethodsAnalyzing the Best Models and Their ErrorsEvaluate Your System on the Test SetLaunch, Monitor, and Maintain Your SystemTry It Out!Exercises
MNISTTraining a Binary ClassifierPerformance MeasuresMeasuring Accuracy Using Cross-ValidationConfusion MatricesPrecision and RecallThe Precision/Recall Trade-OffThe ROC CurveMulticlass ClassificationError AnalysisMultilabel ClassificationMultioutput ClassificationExercises
Linear RegressionThe Normal EquationComputational ComplexityGradient DescentBatch Gradient DescentStochastic Gradient DescentMini-Batch Gradient DescentPolynomial RegressionLearning CurvesRegularized Linear ModelsRidge RegressionLasso RegressionElastic Net RegressionEarly StoppingLogistic RegressionEstimating ProbabilitiesTraining and Cost FunctionDecision BoundariesSoftmax RegressionExercises
Training and Visualizing a Decision TreeMaking PredictionsEstimating Class ProbabilitiesThe CART Training AlgorithmComputational ComplexityGini Impurity or Entropy?Regularization HyperparametersRegressionSensitivity to Axis OrientationDecision Trees Have a High VarianceExercises
Voting ClassifiersBagging and PastingBagging and Pasting in Scikit-LearnOut-of-Bag EvaluationRandom Patches and Random SubspacesRandom ForestsExtra-TreesFeature ImportanceBoostingAdaBoostGradient BoostingHistogram-Based Gradient BoostingStackingExercises
The Curse of DimensionalityMain Approaches for Dimensionality ReductionProjectionManifold LearningPCAPreserving the VariancePrincipal ComponentsProjecting Down to d DimensionsUsing Scikit-LearnExplained Variance RatioChoosing the Right Number of DimensionsPCA for CompressionRandomized PCAIncremental PCARandom ProjectionLLEOther Dimensionality Reduction TechniquesExercises
Clustering Algorithms: k-means and DBSCANk-Means ClusteringLimits of k-MeansUsing Clustering for Image SegmentationUsing Clustering for Semi-Supervised LearningDBSCANOther Clustering AlgorithmsGaussian MixturesUsing Gaussian Mixtures for Anomaly DetectionSelecting the Number of ClustersBayesian Gaussian Mixture ModelsOther Algorithms for Anomaly and Novelty DetectionExercises

From Biological to Artificial NeuronsBiological NeuronsLogical Computations with NeuronsThe PerceptronThe Multilayer Perceptron and BackpropagationBuilding and Training MLPs with Scikit-LearnRegression MLPsClassification MLPsHyperparameter Tuning GuidelinesNumber of Hidden LayersNumber of Neurons per Hidden LayerLearning RateBatch SizeOther HyperparametersExercises
PyTorch FundamentalsPyTorch TensorsHardware AccelerationAutogradImplementing Linear RegressionLinear Regression Using Tensors and AutogradLinear Regression Using PyTorch’s High-Level APIImplementing a Regression MLPImplementing Mini-Batch Gradient Descent Using DataLoadersModel EvaluationBuilding Nonsequential Models Using Custom ModulesBuilding Models with Multiple InputsBuilding Models with Multiple OutputsBuilding an Image Classifier with PyTorchUsing TorchVision to Load the DatasetBuilding the ClassifierFine-Tuning Neural Network Hyperparameters with OptunaSaving and Loading PyTorch ModelsCompiling and Optimizing a PyTorch ModelExercises
The Vanishing/Exploding Gradients ProblemsGlorot Initialization and He InitializationBetter Activation FunctionsBatch NormalizationLayer NormalizationGradient ClippingReusing Pretrained LayersTransfer Learning with PyTorchUnsupervised PretrainingPretraining on an Auxiliary TaskFaster OptimizersMomentumNesterov Accelerated GradientAdaGradRMSPropAdamAdaMaxNAdamAdamWLearning Rate SchedulingExponential SchedulingCosine AnnealingPerformance SchedulingWarming Up the Learning RateCosine Annealing with Warm Restarts1cycle SchedulingAvoiding Overfitting Through Regularizationℓ1 and ℓ2 RegularizationDropoutMonte Carlo DropoutMax-Norm RegularizationPractical GuidelinesExercises
The Architecture of the Visual CortexConvolutional LayersFiltersStacking Multiple Feature MapsImplementing Convolutional Layers with PyTorchPooling LayersImplementing Pooling Layers with PyTorchCNN ArchitecturesLeNet-5AlexNetGoogLeNetResNetXceptionSENetOther Noteworthy ArchitecturesChoosing the Right CNN ArchitectureGPU RAM Requirements: Inference Versus TrainingReversible Residual Networks (RevNets)Implementing a ResNet-34 CNN Using PyTorchUsing TorchVision’s Pretrained ModelsPretrained Models for Transfer LearningClassification and LocalizationObject DetectionFully Convolutional NetworksYou Only Look OnceObject TrackingSemantic SegmentationExercises
Recurrent Neurons and LayersMemory CellsInput and Output SequencesTraining RNNsForecasting a Time SeriesThe ARMA Model FamilyPreparing the Data for Machine Learning ModelsForecasting Using a Linear ModelForecasting Using a Simple RNNForecasting Using a Deep RNNForecasting Multivariate Time SeriesForecasting Several Time Steps AheadForecasting Using a Sequence-to-Sequence ModelHandling Long SequencesFighting the Unstable Gradients ProblemTackling the Short-Term Memory ProblemExercises
Generating Shakespearean Text Using a Character RNNCreating the Training DatasetEmbeddingsBuilding and Training the Char-RNN ModelGenerating Fake Shakespearean TextSentiment Analysis Using Hugging Face LibrariesTokenization Using the Hugging Face Tokenizers LibraryReusing Pretrained TokenizersBuilding and Training a Sentiment Analysis ModelBidirectional RNNsReusing Pretrained Embeddings and Language ModelsTask-Specific ClassesThe Trainer APIHugging Face PipelinesAn Encoder-Decoder Network for Neural Machine TranslationBeam SearchAttention MechanismsExercises
Attention Is All You Need: The Original Transformer ArchitecturePositional EncodingsMulti-Head AttentionBuilding the Rest of the TransformerBuilding an English-to-Spanish TransformerEncoder-Only Transformers for Natural Language UnderstandingBERT’s ArchitectureBERT PretrainingBERT Fine-TuningOther Encoder-Only ModelsDecoder-Only TransformersGPT-1 Architecture and Generative PretrainingGPT-2 and Zero-Shot LearningGPT-3, In-Context Learning, One-Shot Learning, and Few-Shot LearningUsing GPT-2 to Generate TextUsing GPT-2 for Question AnsweringDownloading and Running an Even Larger Model: Mistral-7BTurning a Large Language Model into a ChatbotFine-Tuning a Model for Chatting and Following Instructions Using SFT and RLHFDirect Preference Optimization (DPO)Fine-Tuning a Model Using the TRL LibraryFrom a Chatbot Model to a Full Chatbot SystemModel Context ProtocolLibraries and ToolsEncoder-Decoder ModelsExercises
Vision TransformersRNNs with Visual AttentionDETR: A CNN-Transformer Hybrid for Object DetectionThe Original ViTData-Efficient Image TransformerPyramid Vision Transformer for Dense Prediction TasksThe Swin Transformer: A Fast and Versatile ViTDINO: Self-Supervised Visual Representation LearningOther Major Vision Models and TechniquesMultimodal TransformersVideoBERT: A BERT Variant for Text plus VideoViLBERT: A Dual-Stream Transformer for Text plus ImageCLIP: A Dual-Encoder Text plus Image Model Trained with Contrastive PretrainingDALL·E: Generating Images from Text PromptsPerceiver: Bridging High-Resolution Modalities with Latent SpacesPerceiver IO: A Flexible Output Mechanism for the PerceiverFlamingo: Open-Ended Visual DialogueBLIP and BLIP-2Other Multimodal ModelsExercises
Efficient Data RepresentationsPerforming PCA with an Undercomplete Linear AutoencoderStacked AutoencodersImplementing a Stacked Autoencoder Using PyTorchVisualizing the ReconstructionsAnomaly Detection Using AutoencodersVisualizing the Fashion MNIST DatasetUnsupervised Pretraining Using Stacked AutoencodersTying WeightsTraining One Autoencoder at a TimeConvolutional AutoencodersDenoising AutoencodersSparse AutoencodersVariational AutoencodersGenerating Fashion MNIST ImagesDiscrete Variational AutoencodersGenerative Adversarial NetworksThe Difficulties of Training GANsDiffusion ModelsExercises
What Is Reinforcement Learning?Policy GradientsIntroduction to the Gymnasium LibraryNeural Network PoliciesEvaluating Actions: The Credit Assignment ProblemSolving the CartPole Using Policy GradientsValue-Based MethodsMarkov Decision ProcessesTemporal Difference LearningQ-LearningExploration PoliciesApproximate Q-Learning and Deep Q-LearningImplementing Deep Q-LearningDQN ImprovementsActor-Critic AlgorithmsMastering Atari Breakout Using the Stable-Baselines3 PPO ImplementationOverview of Some Popular RL AlgorithmsExercisesThank You!
Manual DifferentiationFinite Difference ApproximationForward-Mode AutodiffReverse-Mode Autodiff
Common Number RepresentationsReduced Precision ModelsMixed-Precision TrainingQuantizationLinear QuantizationPost-Training Quantization Using torch.ao.quantizationQuantization-Aware Training (QAT)Quantizing LLMs Using the bitsandbytes LibraryUsing Pre-Quantized Models

Content preview from Hands-On Machine Learning with Scikit-Learn and PyTorch

Chapter 7. Dimensionality Reduction

Many machine learning problems involve thousands or even millions of features for each training instance. Not only do all these features make training extremely slow, but they can also make it much harder to find a good solution, as you will see. This problem is often referred to as the curse of dimensionality.

Fortunately, in real-world problems, it is often possible to reduce the number of features considerably, turning an intractable problem into a tractable one. For example, consider the MNIST images (introduced in Chapter 3): the pixels on the image borders are almost always white, so you could completely drop these pixels from the training set without losing much information. As we saw in the previous chapter, Figure 6-6 confirms that these pixels are utterly unimportant for the classification task. Additionally, two neighboring pixels are often highly correlated: if you merge them into a single pixel (e.g., by taking the mean of the two pixel intensities), you will not lose much information, removing redundancy and sometimes even noise.

Warning

Reducing dimensionality can also drop some useful information, just like compressing an image to JPEG can degrade its quality: it can make your system perform slightly worse, especially if you reduce dimensionality too much. Moreover, some models—such as neural networks—can handle high-dimensional data efficiently and learn to reduce its dimensionality while preserving the useful information for ...