O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Python Machine Learning By Example - Second Edition

Book Description

Grasp machine learning concepts, techniques, and algorithms with the help of real-world examples using Python libraries such as TensorFlow and scikit-learn

Key Features

  • Exploit the power of Python to explore the world of data mining and data analytics
  • Discover machine learning algorithms to solve complex challenges faced by data scientists today
  • Use Python libraries such as TensorFlow and Keras to create smart cognitive actions for your projects

Book Description

The surge in interest in machine learning (ML) is due to the fact that it revolutionizes automation by learning patterns in data and using them to make predictions and decisions. If you're interested in ML, this book will serve as your entry point to ML.

Python Machine Learning By Example begins with an introduction to important ML concepts and implementations using Python libraries. Each chapter of the book walks you through an industry adopted application. You'll implement ML techniques in areas such as exploratory data analysis, feature engineering, and natural language processing (NLP) in a clear and easy-to-follow way.

With the help of this extended and updated edition, you'll understand how to tackle data-driven problems and implement your solutions with the powerful yet simple Python language and popular Python packages and tools such as TensorFlow, scikit-learn, gensim, and Keras. To aid your understanding of popular ML algorithms, the book covers interesting and easy-to-follow examples such as news topic modeling and classification, spam email detection, stock price forecasting, and more.

By the end of the book, you'll have put together a broad picture of the ML ecosystem and will be well-versed with the best practices of applying ML techniques to make the most out of new opportunities.

What you will learn

  • Understand the important concepts in machine learning and data science
  • Use Python to explore the world of data mining and analytics
  • Scale up model training using varied data complexities with Apache Spark
  • Delve deep into text and NLP using Python libraries such NLTK and gensim
  • Select and build an ML model and evaluate and optimize its performance
  • Implement ML algorithms from scratch in Python, TensorFlow, and scikit-learn

Who this book is for

If you're a machine learning aspirant, data analyst, or data engineer highly passionate about machine learning and want to begin working on ML assignments, this book is for you. Prior knowledge of Python coding is assumed and basic familiarity with statistical concepts will be beneficial although not necessary.

Downloading the example code for this ebook: You can download the example code files for this ebook on GitHub at the following link: https://github.com/PacktPublishing/Python-Machine-Learning-By-Example-Second-Edition. If you require support please email: customercare@packt.com

Table of Contents

  1. Title Page
  2. Copyright and Credits
    1. Python Machine Learning By Example Second Edition
  3. About Packt
    1. Why subscribe?
    2. Packt.com
  4. Dedication
  5. Foreword
  6. Contributors
    1. About the author
    2. About the reviewer
    3. Packt is searching for authors like you
  7. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  8. Section 1: Fundamentals of Machine Learning
  9. Getting Started with Machine Learning and Python
    1. Defining machine learning and why we need it
    2. A very high-level overview of machine learning technology
      1. Types of machine learning tasks
      2. A brief history of the development of machine learning algorithms
    3. Core of machine learning – generalizing with data
      1. Overfitting, underfitting, and the bias-variance trade-off
      2. Avoiding overfitting with cross-validation
      3. Avoiding overfitting with regularization
      4. Avoiding overfitting with feature selection and dimensionality reduction
    4. Preprocessing, exploration, and feature engineering
      1. Missing values
      2. Label encoding
      3. One hot encoding
      4. Scaling
      5. Polynomial features
      6. Power transform
      7. Binning
    5. Combining models
      1. Voting and averaging
      2. Bagging
      3. Boosting
      4. Stacking
    6. Installing software and setting up
      1. Setting up Python and environments
      2. Installing the various packages
        1. NumPy
        2. SciPy
        3. Pandas
        4. Scikit-learn
        5. TensorFlow
    7. Summary
    8. Exercises
  10. Section 2: Practical Python Machine Learning By Example
  11. Exploring the 20 Newsgroups Dataset with Text Analysis Techniques
    1. How computers understand language - NLP
    2. Picking up NLP basics while touring popular NLP libraries
      1. Corpus
      2. Tokenization
      3. PoS tagging
      4. Named-entity recognition
      5. Stemming and lemmatization
      6. Semantics and topic modeling
    3. Getting the newsgroups data
    4. Exploring the newsgroups data
    5. Thinking about features for text data
      1. Counting the occurrence of each word token
      2. Text preprocessing
      3. Dropping stop words
      4. Stemming and lemmatizing words
    6. Visualizing the newsgroups data with t-SNE
      1. What is dimensionality reduction?
      2. t-SNE for dimensionality reduction
    7. Summary
    8. Exercises
  12. Mining the 20 Newsgroups Dataset with Clustering and Topic Modeling Algorithms
    1. Learning without guidance – unsupervised learning
    2. Clustering newsgroups data using k-means
      1. How does k-means clustering work?
      2. Implementing k-means from scratch
      3. Implementing k-means with scikit-learn
      4. Choosing the value of k
      5. Clustering newsgroups data using k-means
    3. Discovering underlying topics in newsgroups
    4. Topic modeling using NMF
    5. Topic modeling using LDA
    6. Summary
    7. Exercises
  13. Detecting Spam Email with Naive Bayes
    1. Getting started with classification
      1. Types of classification
      2. Applications of text classification
    2. Exploring Naïve Bayes
      1. Learning Bayes' theorem by examples
      2. The mechanics of Naïve Bayes
      3. Implementing Naïve Bayes from scratch
      4. Implementing Naïve Bayes with scikit-learn
    3. Classification performance evaluation
    4. Model tuning and cross-validation
    5. Summary
    6. Exercise
  14. Classifying Newsgroup Topics with Support Vector Machines
    1. Finding separating boundary with support vector machines
      1. Understanding how SVM works through different use cases
        1. Case 1 – identifying a separating hyperplane
        2. Case 2 – determining the optimal hyperplane
        3. Case 3 – handling outliers
      2. Implementing SVM
        1. Case 4 – dealing with more than two classes
      3. The kernels of SVM
        1. Case 5 – solving linearly non-separable problems
      4. Choosing between linear and RBF kernels
    2. Classifying newsgroup topics with SVMs
    3. More example – fetal state classification on cardiotocography
    4. A further example – breast cancer classification using SVM with TensorFlow
    5. Summary
    6. Exercise
  15. Predicting Online Ad Click-Through with Tree-Based Algorithms
    1. Brief overview of advertising click-through prediction
    2. Getting started with two types of data – numerical and categorical
    3. Exploring decision tree from root to leaves
      1. Constructing a decision tree
      2. The metrics for measuring a split
    4. Implementing a decision tree from scratch
    5. Predicting ad click-through with decision tree
    6. Ensembling decision trees – random forest
      1. Implementing random forest using TensorFlow
    7. Summary
    8. Exercise
  16. Predicting Online Ad Click-Through with Logistic Regression
    1. Converting categorical features to numerical – one-hot encoding and ordinal encoding
    2. Classifying data with logistic regression
      1. Getting started with the logistic function
      2. Jumping from the logistic function to logistic regression
    3. Training a logistic regression model
      1. Training a logistic regression model using gradient descent
      2. Predicting ad click-through with logistic regression using gradient descent
      3. Training a logistic regression model using stochastic gradient descent
      4. Training a logistic regression model with regularization
    4. Training on large datasets with online learning
    5. Handling multiclass classification
    6. Implementing logistic regression using TensorFlow
    7. Feature selection using random forest
    8. Summary
    9. Exercises
  17. Scaling Up Prediction to Terabyte Click Logs
    1. Learning the essentials of Apache Spark
      1. Breaking down Spark
      2. Installing Spark
      3. Launching and deploying Spark programs
    2. Programming in PySpark
    3. Learning on massive click logs with Spark
      1. Loading click logs
      2. Splitting and caching the data
      3. One-hot encoding categorical features
      4. Training and testing a logistic regression model
    4. Feature engineering on categorical variables with Spark
      1. Hashing categorical features
      2. Combining multiple variables – feature interaction
    5. Summary
    6. Exercises
  18. Stock Price Prediction with Regression Algorithms
    1. Brief overview of the stock market and stock prices
    2. What is regression?
    3. Mining stock price data
      1. Getting started with feature engineering
      2. Acquiring data and generating features
    4. Estimating with linear regression
      1. How does linear regression work?
      2. Implementing linear regression
    5. Estimating with decision tree regression
      1. Transitioning from classification trees to regression trees
      2. Implementing decision tree regression
      3. Implementing regression forest
    6. Estimating with support vector regression
      1. Implementing SVR
    7. Estimating with neural networks
      1. Demystifying neural networks
      2. Implementing neural networks
    8. Evaluating regression performance
    9. Predicting stock price with four regression algorithms
    10. Summary
    11. Exercise
  19. Section 3: Python Machine Learning Best Practices
  20. Machine Learning Best Practices
    1. Machine learning solution workflow
    2. Best practices in the data preparation stage
      1. Best practice 1 – completely understanding the project goal
      2. Best practice 2 – collecting all fields that are relevant
      3. Best practice 3 – maintaining the consistency of field values
      4. Best practice 4 – dealing with missing data
      5. Best practice 5 – storing large-scale data
    3. Best practices in the training sets generation stage
      1. Best practice 6 – identifying categorical features with numerical values
      2. Best practice 7 – deciding on whether or not to encode categorical features
      3. Best practice 8 – deciding on whether or not to select features, and if so, how to do so
      4. Best practice 9 – deciding on whether or not to reduce dimensionality, and if so, how to do so
      5. Best practice 10 – deciding on whether or not to rescale features
      6. Best practice 11 – performing feature engineering with domain expertise
      7. Best practice 12 – performing feature engineering without domain expertise
      8. Best practice 13 – documenting how each feature is generated
      9. Best practice 14 – extracting features from text data
    4. Best practices in the model training, evaluation, and selection stage
      1. Best practice 15 – choosing the right algorithm(s) to start with
        1. Naïve Bayes
        2. Logistic regression
        3. SVM
        4. Random forest (or decision tree)
        5. Neural networks
      2. Best practice 16 – reducing overfitting
      3. Best practice 17 – diagnosing overfitting and underfitting
      4. Best practice 18 – modeling on large-scale datasets
    5. Best practices in the deployment and monitoring stage
      1. Best practice 19 – saving, loading, and reusing models
      2. Best practice 20 – monitoring model performance
      3. Best practice 21 – updating models regularly
    6. Summary
    7. Exercises
  21. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think