O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Mastering Machine Learning with scikit-learn - Second Edition

Book Description

Use scikit-learn to apply machine learning to real-world problems

About This Book

  • Master popular machine learning models including k-nearest neighbors, random forests, logistic regression, k-means, naive Bayes, and artificial neural networks

  • Learn how to build and evaluate performance of efficient models using scikit-learn

  • Practical guide to master your basics and learn from real life applications of machine learning

  • Who This Book Is For

    This book is intended for software engineers who want to understand how common machine learning algorithms work and develop an intuition for how to use them, and for data scientists who want to learn about the scikit-learn API. Familiarity with machine learning fundamentals and Python are helpful, but not required.

    What You Will Learn

  • Review fundamental concepts such as bias and variance

  • Extract features from categorical variables, text, and images

  • Predict the values of continuous variables using linear regression and K Nearest Neighbors

  • Classify documents and images using logistic regression and support vector machines

  • Create ensembles of estimators using bagging and boosting techniques

  • Discover hidden structures in data using K-Means clustering

  • Evaluate the performance of machine learning systems in common tasks

  • In Detail

    Machine learning is the buzzword bringing computer science and statistics together to build smart and efficient models. Using powerful algorithms and techniques offered by machine learning you can automate any analytical model.

    This book examines a variety of machine learning models including popular machine learning algorithms such as k-nearest neighbors, logistic regression, naive Bayes, k-means, decision trees, and artificial neural networks. It discusses data preprocessing, hyperparameter optimization, and ensemble methods. You will build systems that classify documents, recognize images, detect ads, and more. You will learn to use scikit-learn’s API to extract features from categorical variables, text and images; evaluate model performance, and develop an intuition for how to improve your model’s performance.

    By the end of this book, you will master all required concepts of scikit-learn to build efficient models at work to carry out advanced tasks with the practical approach.

    Style and approach

    This book is motivated by the belief that you do not understand something until you can describe it simply. Work through toy problems to develop your understanding of the learning algorithms and models, then apply your learnings to real-life problems.

    Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

    Table of Contents

    1. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Errata
        3. Piracy
        4. Questions
    2. The Fundamentals of Machine Learning
      1. Defining machine learning
      2. Learning from experience
      3. Machine learning tasks
      4. Training data, testing data, and validation data
      5. Bias and variance
      6. An introduction to scikit-learn
      7. Installing scikit-learn
        1. Installing using pip
        2. Installing on Windows
        3. Installing on Ubuntu 16.04
        4. Installing on Mac OS
        5. Installing Anaconda
        6. Verifying the installation
      8. Installing pandas, Pillow, NLTK, and matplotlib
      9. Summary
    3. Simple Linear Regression
      1. Simple linear regression
        1. Evaluating the fitness of the model with a cost function
        2. Solving OLS for simple linear regression
      2. Evaluating the model
      3. Summary
    4. Classification and Regression with k-Nearest Neighbors
      1. K-Nearest Neighbors
      2. Lazy learning and non-parametric models
      3. Classification with KNN
      4. Regression with KNN
        1. Scaling features
      5. Summary
    5. Feature Extraction
      1. Extracting features from categorical variables
      2. Standardizing features
      3. Extracting features from text
        1. The bag-of-words model
        2. Stop word filtering
        3. Stemming and lemmatization
        4. Extending bag-of-words with tf-idf weights
        5. Space-efficient feature vectorizing with the hashing trick
        6. Word embeddings
      4. Extracting features from images
        1. Extracting features from pixel intensities
        2. Using convolutional neural network activations as features
      5. Summary
    6. From Simple Linear Regression to Multiple Linear Regression
      1. Multiple linear regression
      2. Polynomial regression
      3. Regularization
      4. Applying linear regression
        1. Exploring the data
        2. Fitting and evaluating the model
      5. Gradient descent
      6. Summary
    7. From Linear Regression to Logistic Regression
      1. Binary classification with logistic regression
      2. Spam filtering
        1. Binary classification performance metrics
        2. Accuracy
        3. Precision and recall
        4. Calculating the F1 measure
        5. ROC AUC
      3. Tuning models with grid search
      4. Multi-class classification
        1. Multi-class classification performance metrics
      5. Multi-label classification and problem transformation
        1. Multi-label classification performance metrics
      6. Summary
    8. Naive Bayes
      1. Bayes' theorem
      2. Generative and discriminative models
      3. Naive Bayes
        1. Assumptions of Naive Bayes
      4. Naive Bayes with scikit-learn
      5. Summary
    9. Nonlinear Classification and Regression with Decision Trees
      1. Decision trees
      2. Training decision trees
        1. Selecting the questions
          1. Information gain
        2. Gini impurity
      3. Decision trees with scikit-learn
        1. Advantages and disadvantages of decision trees
      4. Summary
    10. From Decision Trees to Random Forests and Other Ensemble Methods
      1. Bagging
      2. Boosting
      3. Stacking
      4. Summary
    11. The Perceptron
      1. The perceptron
        1. Activation functions
        2. The perceptron learning algorithm
        3. Binary classification with the perceptron
        4. Document classification with the perceptron
      2. Limitations of the perceptron
      3. Summary
    12. From the Perceptron to Support Vector Machines
      1. Kernels and the kernel trick
      2. Maximum margin classification and support vectors
      3. Classifying characters in scikit-learn
        1. Classifying handwritten digits
        2. Classifying characters in natural images
      4. Summary
    13. From the Perceptron to Artificial Neural Networks
      1. Nonlinear decision boundaries
      2. Feed-forward and feedback ANNs
      3. Multi-layer perceptrons
      4. Training multi-layer perceptrons
        1. Backpropagation
        2. Training a multi-layer perceptron to approximate XOR
        3. Training a multi-layer perceptron to classify handwritten digits
      5. Summary
    14. K-means
      1. Clustering
      2. K-means
        1. Local optima
        2. Selecting K with the elbow method
      3. Evaluating clusters
      4. Image quantization
      5. Clustering to learn features
      6. Summary
    15. Dimensionality Reduction with Principal Component Analysis
      1. Principal component analysis
        1. Variance, covariance, and covariance matrices
        2. Eigenvectors and eigenvalues
        3. Performing PCA
      2. Visualizing high-dimensional data with PCA
      3. Face recognition with PCA
      4. Summary