book

Python Machine Learning, Second Edition - Second Edition

by Sebastian Raschka, Jared Huffman, Vahid Mirjalili, Ryan Sun

September 2017

Intermediate to advanced

622 pages

15h 13m

English

Packt Publishing

Read now

Unlock full access

eBooks, discount offers, and moreWhy subscribe?
What this book covers

Downloading the example codeDownloading the color images of this bookErrataPiracyQuestions
Building intelligent machines to transform data into knowledge
Making predictions about the future with supervised learningClassification for predicting class labelsRegression for predicting continuous outcomesSolving interactive problems with reinforcement learningDiscovering hidden structures with unsupervised learningFinding subgroups with clusteringDimensionality reduction for data compression
Preprocessing – getting data into shapeTraining and selecting a predictive modelEvaluating models and predicting unseen data instances
Installing Python and packages from the Python Package IndexUsing the Anaconda Python distribution and package managerPackages for scientific computing, data science, and machine learning
Artificial neurons – a brief glimpse into the early history of machine learningThe formal definition of an artificial neuronThe perceptron learning rule
An object-oriented perceptron APITraining a perceptron model on the Iris dataset
Minimizing cost functions with gradient descentImplementing Adaline in PythonImproving gradient descent through feature scalingLarge-scale machine learning and stochastic gradient descent
Choosing a classification algorithm
Logistic regression intuition and conditional probabilitiesLearning the weights of the logistic cost functionConverting an Adaline implementation into an algorithm for logistic regressionTraining a logistic regression model with scikit-learnTackling overfitting via regularization
Maximum margin intuitionDealing with a nonlinearly separable case using slack variablesAlternative implementations in scikit-learn
Kernel methods for linearly inseparable dataUsing the kernel trick to find separating hyperplanes in high-dimensional space
Maximizing information gain – getting the most bang for your buckBuilding a decision treeCombining multiple decision trees via random forests
Dealing with missing dataIdentifying missing values in tabular dataEliminating samples or features with missing valuesImputing missing valuesUnderstanding the scikit-learn estimator API
Nominal and ordinal featuresCreating an example datasetMapping ordinal featuresEncoding class labelsPerforming one-hot encoding on nominal features
L1 and L2 regularization as penalties against model complexityA geometric interpretation of L2 regularizationSparse solutions with L1 regularizationSequential feature selection algorithms
Unsupervised dimensionality reduction via principal component analysisThe main steps behind principal component analysisExtracting the principal components step by stepTotal and explained varianceFeature transformationPrincipal component analysis in scikit-learn
Principal component analysis versus linear discriminant analysisThe inner workings of linear discriminant analysisComputing the scatter matricesSelecting linear discriminants for the new feature subspaceProjecting samples onto the new feature spaceLDA via scikit-learn
Kernel functions and the kernel trickImplementing a kernel principal component analysis in PythonExample 1 – separating half-moon shapesExample 2 – separating concentric circlesProjecting new data pointsKernel principal component analysis in scikit-learn
Streamlining workflows with pipelinesLoading the Breast Cancer Wisconsin datasetCombining transformers and estimators in a pipeline
The holdout methodK-fold cross-validation
Diagnosing bias and variance problems with learning curvesAddressing over- and underfitting with validation curves
Tuning hyperparameters via grid searchAlgorithm selection with nested cross-validation
Reading a confusion matrixOptimizing the precision and recall of a classification modelPlotting a receiver operating characteristicScoring metrics for multiclass classification
Learning with ensembles
Implementing a simple majority vote classifierUsing the majority voting principle to make predictionsEvaluating and tuning the ensemble classifier
Bagging in a nutshellApplying bagging to classify samples in the Wine dataset
How boosting worksApplying AdaBoost using scikit-learn
Preparing the IMDb movie review data for text processingObtaining the movie review datasetPreprocessing the movie dataset into more convenient format
Transforming words into feature vectorsAssessing word relevancy via term frequency-inverse document frequencyCleaning text dataProcessing documents into tokens
Decomposing text documents with LDALDA with scikit-learn
Serializing fitted scikit-learn estimators
Our first Flask web applicationForm validation and renderingSetting up the directory structureImplementing a macro using the Jinja2 templating engineAdding style via CSSCreating the result page
Files and folders – looking at the directory treeImplementing the main application as app.pySetting up the review formCreating a results page template
Creating a PythonAnywhere accountUploading the movie classifier applicationUpdating the movie classifier
Introducing linear regressionSimple linear regressionMultiple linear regression
Loading the Housing dataset into a data frameVisualizing the important characteristics of a datasetLooking at relationships using a correlation matrix
Solving regression for regression parameters with gradient descentEstimating coefficient of a regression model via scikit-learn
Adding polynomial terms using scikit-learnModeling nonlinear relationships in the Housing dataset
Decision tree regressionRandom forest regression
Grouping objects by similarity using k-meansK-means clustering using scikit-learnA smarter way of placing the initial cluster centroids using k-means++Hard versus soft clusteringUsing the elbow method to find the optimal number of clustersQuantifying the quality of clustering via silhouette plots
Grouping clusters in bottom-up fashionPerforming hierarchical clustering on a distance matrixAttaching dendrograms to a heat mapApplying agglomerative clustering via scikit-learn
Modeling complex functions with artificial neural networksSingle-layer neural network recapIntroducing the multilayer neural network architectureActivating a neural network via forward propagation
Obtaining the MNIST datasetImplementing a multilayer perceptron
Computing the logistic cost functionDeveloping your intuition for backpropagationTraining neural networks via backpropagation
TensorFlow and training performanceWhat is TensorFlow?How we will learn TensorFlowFirst steps with TensorFlowWorking with array structuresDeveloping a simple model with the low-level TensorFlow API
Building multilayer neural networks using TensorFlow's Layers APIDeveloping a multilayer neural network with Keras
Logistic function recapEstimating class probabilities in multiclass classification via the softmax functionBroadening the output spectrum using a hyperbolic tangentRectified linear unit activation
Key features of TensorFlow
How to get the rank and shape of a tensor
Defining placeholdersFeeding placeholders with dataDefining placeholders for data arrays with varying batchsizes
Defining variablesInitializing variablesVariable scopeReusing variables
Extending your TensorBoard experience
Building blocks of convolutional neural networksUnderstanding CNNs and learning feature hierarchiesPerforming discrete convolutionsPerforming a discrete convolution in one dimensionThe effect of zero-padding in a convolutionDetermining the size of the convolution outputPerforming a discrete convolution in 2DSubsampling
Working with multiple input or color channelsRegularizing a neural network with dropout
The multilayer CNN architectureLoading and preprocessing the dataImplementing a CNN in the TensorFlow low-level APIImplementing a CNN in the TensorFlow Layers API
Introducing sequential dataModeling sequential data – order mattersRepresenting sequencesThe different categories of sequence modeling
Understanding the structure and flow of an RNNComputing activations in an RNNThe challenges of learning long-range interactionsLSTM units
Preparing the dataEmbeddingBuilding an RNN modelThe SentimentRNN class constructorThe build methodStep 1 – defining multilayer RNN cellsStep 2 – defining the initial states for the RNN cellsStep 3 – creating the RNN using the RNN cells and their statesThe train methodThe predict methodInstantiating the SentimentRNN classTraining and optimizing the sentiment analysis RNN model
Preparing the dataBuilding a character-level RNN modelThe constructorThe build methodThe train methodThe sample methodCreating and training the CharRNN ModelThe CharRNN model in the sampling mode

Content preview from Python Machine Learning, Second Edition - Second Edition

About the convergence in neural networks

You might be wondering why we did not use regular gradient descent but instead used mini-batch learning to train our neural network for the handwritten digit classification. You may recall our discussion on stochastic gradient descent that we used to implement online learning. In online learning, we compute the gradient based on a single training example (k = 1) at a time to perform the weight update. Although this is a stochastic approach, it often leads to very accurate solutions with a much faster convergence than regular gradient descent. Mini-batch learning is a special form of stochastic gradient descent where we compute the gradient based on a subset k of the n training samples with 1 < k < n. Mini-batch ...