book

Machine Learning with PyTorch and Scikit-Learn

by Sebastian Raschka, Yuxi (Hayden) Liu, Vahid Mirjalili

February 2022

Intermediate to advanced

774 pages

21h 56m

English

Packt Publishing

Read now

Unlock full access

Who this book is forWhat this book coversTo get the most out of this bookGet in touchShare your thoughts
Building intelligent machines to transform data into knowledgeThe three different types of machine learningMaking predictions about the future with supervised learningClassification for predicting class labelsRegression for predicting continuous outcomesSolving interactive problems with reinforcement learningDiscovering hidden structures with unsupervised learningFinding subgroups with clusteringDimensionality reduction for data compressionIntroduction to the basic terminology and notationsNotation and conventions used in this bookMachine learning terminologyA roadmap for building machine learning systemsPreprocessing – getting data into shapeTraining and selecting a predictive modelEvaluating models and predicting unseen data instancesUsing Python for machine learningInstalling Python and packages from the Python Package IndexUsing the Anaconda Python distribution and package managerPackages for scientific computing, data science, and machine learningSummary
Artificial neurons – a brief glimpse into the early history of machine learningThe formal definition of an artificial neuronThe perceptron learning ruleImplementing a perceptron learning algorithm in PythonAn object-oriented perceptron APITraining a perceptron model on the Iris datasetAdaptive linear neurons and the convergence of learningMinimizing loss functions with gradient descentImplementing Adaline in PythonImproving gradient descent through feature scalingLarge-scale machine learning and stochastic gradient descentSummary
Choosing a classification algorithmFirst steps with scikit-learn – training a perceptronModeling class probabilities via logistic regressionLogistic regression and conditional probabilitiesLearning the model weights via the logistic loss functionConverting an Adaline implementation into an algorithm for logistic regressionTraining a logistic regression model with scikit-learnTackling overfitting via regularizationMaximum margin classification with support vector machinesMaximum margin intuitionDealing with a nonlinearly separable case using slack variablesAlternative implementations in scikit-learnSolving nonlinear problems using a kernel SVMKernel methods for linearly inseparable dataUsing the kernel trick to find separating hyperplanes in a high-dimensional spaceDecision tree learningMaximizing IG – getting the most bang for your buckBuilding a decision treeCombining multiple decision trees via random forestsK-nearest neighbors – a lazy learning algorithmSummary
Dealing with missing dataIdentifying missing values in tabular dataEliminating training examples or features with missing valuesImputing missing valuesUnderstanding the scikit-learn estimator APIHandling categorical dataCategorical data encoding with pandasMapping ordinal featuresEncoding class labelsPerforming one-hot encoding on nominal featuresOptional: encoding ordinal featuresPartitioning a dataset into separate training and test datasetsBringing features onto the same scaleSelecting meaningful featuresL1 and L2 regularization as penalties against model complexityA geometric interpretation of L2 regularizationSparse solutions with L1 regularizationSequential feature selection algorithmsAssessing feature importance with random forestsSummary
Unsupervised dimensionality reduction via principal component analysisThe main steps in principal component analysisExtracting the principal components step by stepTotal and explained varianceFeature transformationPrincipal component analysis in scikit-learnAssessing feature contributionsSupervised data compression via linear discriminant analysisPrincipal component analysis versus linear discriminant analysisThe inner workings of linear discriminant analysisComputing the scatter matricesSelecting linear discriminants for the new feature subspaceProjecting examples onto the new feature spaceLDA via scikit-learnNonlinear dimensionality reduction and visualizationWhy consider nonlinear dimensionality reduction?Visualizing data via t-distributed stochastic neighbor embeddingSummary
Streamlining workflows with pipelinesLoading the Breast Cancer Wisconsin datasetCombining transformers and estimators in a pipelineUsing k-fold cross-validation to assess model performanceThe holdout methodK-fold cross-validationDebugging algorithms with learning and validation curvesDiagnosing bias and variance problems with learning curvesAddressing over- and underfitting with validation curvesFine-tuning machine learning models via grid searchTuning hyperparameters via grid searchExploring hyperparameter configurations more widely with randomized searchMore resource-efficient hyperparameter search with successive halvingAlgorithm selection with nested cross-validationLooking at different performance evaluation metricsReading a confusion matrixOptimizing the precision and recall of a classification modelPlotting a receiver operating characteristicScoring metrics for multiclass classificationDealing with class imbalanceSummary
Learning with ensemblesCombining classifiers via majority voteImplementing a simple majority vote classifierUsing the majority voting principle to make predictionsEvaluating and tuning the ensemble classifierBagging – building an ensemble of classifiers from bootstrap samplesBagging in a nutshellApplying bagging to classify examples in the Wine datasetLeveraging weak learners via adaptive boostingHow adaptive boosting worksApplying AdaBoost using scikit-learnGradient boosting – training an ensemble based on loss gradientsComparing AdaBoost with gradient boostingOutlining the general gradient boosting algorithmExplaining the gradient boosting algorithm for classificationIllustrating gradient boosting for classificationUsing XGBoostSummary
Preparing the IMDb movie review data for text processingObtaining the movie review datasetPreprocessing the movie dataset into a more convenient formatIntroducing the bag-of-words modelTransforming words into feature vectorsAssessing word relevancy via term frequency-inverse document frequencyCleaning text dataProcessing documents into tokensTraining a logistic regression model for document classificationWorking with bigger data – online algorithms and out-of-core learningTopic modeling with latent Dirichlet allocationDecomposing text documents with LDALDA with scikit-learnSummary
Introducing linear regressionSimple linear regressionMultiple linear regressionExploring the Ames Housing datasetLoading the Ames Housing dataset into a DataFrameVisualizing the important characteristics of a datasetLooking at relationships using a correlation matrixImplementing an ordinary least squares linear regression modelSolving regression for regression parameters with gradient descentEstimating the coefficient of a regression model via scikit-learnFitting a robust regression model using RANSACEvaluating the performance of linear regression modelsUsing regularized methods for regressionTurning a linear regression model into a curve – polynomial regressionAdding polynomial terms using scikit-learnModeling nonlinear relationships in the Ames Housing datasetDealing with nonlinear relationships using random forestsDecision tree regressionRandom forest regressionSummary

Grouping objects by similarity using k-meansk-means clustering using scikit-learnA smarter way of placing the initial cluster centroids using k-means++Hard versus soft clusteringUsing the elbow method to find the optimal number of clustersQuantifying the quality of clustering via silhouette plotsOrganizing clusters as a hierarchical treeGrouping clusters in a bottom-up fashionPerforming hierarchical clustering on a distance matrixAttaching dendrograms to a heat mapApplying agglomerative clustering via scikit-learnLocating regions of high density via DBSCANSummary
Modeling complex functions with artificial neural networksSingle-layer neural network recapIntroducing the multilayer neural network architectureActivating a neural network via forward propagationClassifying handwritten digitsObtaining and preparing the MNIST datasetImplementing a multilayer perceptronCoding the neural network training loopEvaluating the neural network performanceTraining an artificial neural networkComputing the loss functionDeveloping your understanding of backpropagationTraining neural networks via backpropagationAbout convergence in neural networksA few last words about the neural network implementationSummary
PyTorch and training performancePerformance challengesWhat is PyTorch?How we will learn PyTorchFirst steps with PyTorchInstalling PyTorchCreating tensors in PyTorchManipulating the data type and shape of a tensorApplying mathematical operations to tensorsSplit, stack, and concatenate tensorsBuilding input pipelines in PyTorchCreating a PyTorch DataLoader from existing tensorsCombining two tensors into a joint datasetShuffle, batch, and repeatCreating a dataset from files on your local storage diskFetching available datasets from the torchvision.datasets libraryBuilding an NN model in PyTorchThe PyTorch neural network module (torch.nn)Building a linear regression modelModel training via the torch.nn and torch.optim modulesBuilding a multilayer perceptron for classifying flowers in the Iris datasetEvaluating the trained model on the test datasetSaving and reloading the trained modelChoosing activation functions for multilayer neural networksLogistic function recapEstimating class probabilities in multiclass classification via the softmax functionBroadening the output spectrum using a hyperbolic tangentRectified linear unit activationSummary
The key features of PyTorchPyTorch’s computation graphsUnderstanding computation graphsCreating a graph in PyTorchPyTorch tensor objects for storing and updating model parametersComputing gradients via automatic differentiationComputing the gradients of the loss with respect to trainable variablesUnderstanding automatic differentiationAdversarial examplesSimplifying implementations of common architectures via the torch.nn moduleImplementing models based on nn.SequentialChoosing a loss functionSolving an XOR classification problemMaking model building more flexible with nn.ModuleWriting custom layers in PyTorchProject one – predicting the fuel efficiency of a carWorking with feature columnsTraining a DNN regression modelProject two – classifying MNIST handwritten digitsHigher-level PyTorch APIs: a short introduction to PyTorch-LightningSetting up the PyTorch Lightning modelSetting up the data loaders for LightningTraining the model using the PyTorch Lightning Trainer classEvaluating the model using TensorBoardSummary
The building blocks of CNNsUnderstanding CNNs and feature hierarchiesPerforming discrete convolutionsDiscrete convolutions in one dimensionPadding inputs to control the size of the output feature mapsDetermining the size of the convolution outputPerforming a discrete convolution in 2DSubsampling layersPutting everything together – implementing a CNNWorking with multiple input or color channelsRegularizing an NN with L2 regularization and dropoutLoss functions for classificationImplementing a deep CNN using PyTorchThe multilayer CNN architectureLoading and preprocessing the dataImplementing a CNN using the torch.nn moduleConfiguring CNN layers in PyTorchConstructing a CNN in PyTorchSmile classification from face images using a CNNLoading the CelebA datasetImage transformation and data augmentationTraining a CNN smile classifierSummary
Introducing sequential dataModeling sequential data – order mattersSequential data versus time series dataRepresenting sequencesThe different categories of sequence modelingRNNs for modeling sequencesUnderstanding the dataflow in RNNsComputing activations in an RNNHidden recurrence versus output recurrenceThe challenges of learning long-range interactionsLong short-term memory cellsImplementing RNNs for sequence modeling in PyTorchProject one – predicting the sentiment of IMDb movie reviewsPreparing the movie review dataEmbedding layers for sentence encodingBuilding an RNN modelBuilding an RNN model for the sentiment analysis taskProject two – character-level language modeling in PyTorchPreprocessing the datasetBuilding a character-level RNN modelEvaluation phase – generating new text passagesSummary
Adding an attention mechanism to RNNsAttention helps RNNs with accessing informationThe original attention mechanism for RNNsProcessing the inputs using a bidirectional RNNGenerating outputs from context vectorsComputing the attention weightsIntroducing the self-attention mechanismStarting with a basic form of self-attentionParameterizing the self-attention mechanism: scaled dot-product attentionAttention is all we need: introducing the original transformer architectureEncoding context embeddings via multi-head attentionLearning a language model: decoder and masked multi-head attentionImplementation details: positional encodings and layer normalizationBuilding large-scale language models by leveraging unlabeled dataPre-training and fine-tuning transformer modelsLeveraging unlabeled data with GPTUsing GPT-2 to generate new textBidirectional pre-training with BERTThe best of both worlds: BARTFine-tuning a BERT model in PyTorchLoading the IMDb movie review datasetTokenizing the datasetLoading and fine-tuning a pre-trained BERT modelFine-tuning a transformer more conveniently using the Trainer APISummary
Introducing generative adversarial networksStarting with autoencodersGenerative models for synthesizing new dataGenerating new samples with GANsUnderstanding the loss functions of the generator and discriminator networks in a GAN modelImplementing a GAN from scratchTraining GAN models on Google ColabImplementing the generator and the discriminator networksDefining the training datasetTraining the GAN modelImproving the quality of synthesized images using a convolutional and Wasserstein GANTransposed convolutionBatch normalizationImplementing the generator and discriminatorDissimilarity measures between two distributionsUsing EM distance in practice for GANsGradient penaltyImplementing WGAN-GP to train the DCGAN modelMode collapseOther GAN applicationsSummary
Introduction to graph dataUndirected graphsDirected graphsLabeled graphsRepresenting molecules as graphsUnderstanding graph convolutionsThe motivation behind using graph convolutionsImplementing a basic graph convolutionImplementing a GNN in PyTorch from scratchDefining the NodeNetwork modelCoding the NodeNetwork’s graph convolution layerAdding a global pooling layer to deal with varying graph sizesPreparing the DataLoaderUsing the NodeNetwork to make predictionsImplementing a GNN using the PyTorch Geometric libraryOther GNN layers and recent developmentsSpectral graph convolutionsPoolingNormalizationPointers to advanced graph neural network literatureSummary
Introduction – learning from experienceUnderstanding reinforcement learningDefining the agent-environment interface of a reinforcement learning systemThe theoretical foundations of RLMarkov decision processesThe mathematical formulation of Markov decision processesVisualization of a Markov processEpisodic versus continuing tasksRL terminology: return, policy, and value functionThe returnPolicyValue functionDynamic programming using the Bellman equationReinforcement learning algorithmsDynamic programmingPolicy evaluation – predicting the value function with dynamic programmingImproving the policy using the estimated value functionPolicy iterationValue iterationReinforcement learning with Monte CarloState-value function estimation using MCAction-value function estimation using MCFinding an optimal policy using MC controlPolicy improvement – computing the greedy policy from the action-value functionTemporal difference learningTD predictionOn-policy TD control (SARSA)Off-policy TD control (Q-learning)Implementing our first RL algorithmIntroducing the OpenAI Gym toolkitWorking with the existing environments in OpenAI GymA grid world exampleImplementing the grid world environment in OpenAI GymSolving the grid world problem with Q-learningA glance at deep Q-learningTraining a DQN model according to the Q-learning algorithmReplay memoryDetermining the target values for computing the lossImplementing a deep Q-learning algorithmChapter and book summary

Content preview from Machine Learning with PyTorch and Scikit-Learn

16 Transformers – Improving Natural Language Processing with Attention Mechanisms

In the previous chapter, we learned about recurrent neural networks (RNNs) and their applications in natural language processing (NLP) through a sentiment analysis project. However, a new architecture has recently emerged that has been shown to outperform the RNN-based sequence-to-sequence (seq2seq) models in several NLP tasks. This is the so-called transformer architecture.

Transformers have revolutionized natural language processing and have been at the forefront of many impressive applications ranging from automated language translation (https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html) and modeling fundamental properties of protein ...