Machine Learning for Streaming Data with Python

Book description

Apply machine learning to streaming data with the help of practical examples, and deal with challenges that surround streaming

Key Features

  • Work on streaming use cases that are not taught in most data science courses
  • Gain experience with state-of-the-art tools for streaming data
  • Mitigate various challenges while handling streaming data

Book Description

Streaming data is the new top technology to watch out for in the field of data science and machine learning. As business needs become more demanding, many use cases require real-time analysis as well as real-time machine learning. This book will help you to get up to speed with data analytics for streaming data and focus strongly on adapting machine learning and other analytics to the case of streaming data.

You will first learn about the architecture for streaming and real-time machine learning. Next, you will look at the state-of-the-art frameworks for streaming data like River. Later chapters will focus on various industrial use cases for streaming data like Online Anomaly Detection and others. As you progress, you will discover various challenges and learn how to mitigate them. In addition to this, you will learn best practices that will help you use streaming data to generate real-time insights.

By the end of this book, you will have gained the confidence you need to stream data in your machine learning models.

What you will learn

  • Understand the challenges and advantages of working with streaming data
  • Develop real-time insights from streaming data
  • Understand the implementation of streaming data with various use cases to boost your knowledge
  • Develop a PCA alternative that can work on real-time data
  • Explore best practices for handling streaming data that you absolutely need to remember
  • Develop an API for real-time machine learning inference

Who this book is for

This book is for data scientists and machine learning engineers who have a background in machine learning, are practice and technology-oriented, and want to learn how to apply machine learning to streaming data through practical examples with modern technologies. Although an understanding of basic Python and machine learning concepts is a must, no prior knowledge of streaming is required.

Publisher resources

Download Example Code

Table of contents

  1. Machine Learning for Streaming Data with Python
  2. Contributors
  3. About the author
  4. About the reviewer
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Share Your Thoughts
  6. Part 1: Introduction and Core Concepts of Streaming Data
  7. Chapter 1: An Introduction to Streaming Data
    1. Technical requirements
      1. Setting up a Python environment
    2. A short history of data science
    3. Working with streaming data
      1. Streaming data versus batch data
      2. Advantages of streaming data
      3. Examples of successful implementation of streaming analytics
      4. Challenges of streaming data
      5. How to get started with streaming data
      6. Common use cases for streaming data
      7. Streaming versus big data
    4. Real-time data formats and importing an example dataset in Python
    5. Summary
    6. Further reading
  8. Chapter 2: Architectures for Streaming and Real-Time Machine Learning
    1. Technical requirements
      1. Python environment
    2. Defining your analytics as a function
    3. Understanding microservices architecture
    4. Communicating between services through APIs
    5. Demystifying the HTTP protocol
      1. The GET request
      2. The POST request
      3. JSON format for communication between systems
      4. RESTful APIs
    6. Building a simple API on AWS
      1. API Gateway in AWS
      2. Lambda in AWS
      3. Data-generating process on a local machine
      4. Implementing the example
      5. More architectural considerations
      6. Other AWS services and other services in general that have the same functionality
    7. Big data tools for real time streaming
      1. Calling a big data environment in real time
    8. Summary
    9. Further reading
  9. Chapter 3: Data Analysis on Streaming Data
    1. Technical requirements
      1. Python environment
    2. Descriptive statistics on streaming data
      1. Why are descriptive statistics different on streaming data?
    3. Introduction to sampling theory
      1. Comparing population and sample
      2. Population parameters and sample statistics
      3. Sampling distribution
      4. Sample size calculations and confidence level
      5. Rolling descriptive statistics from streaming
      6. Exponential weight
      7. Tracking convergence as an additional KPI
    4. Overview of the main descriptive statistics
      1. The mean
      2. The median
      3. The mode
      4. Standard deviation
      5. Variance
      6. Quartiles and interquartile range
      7. Correlations
    5. Real-time visualizations
      1. Opening the dashboard
      2. Comparing Plotly's Dash and other real-time visualization tools
    6. Building basic alerting systems
      1. Alerting systems on extreme values
      2. Alerting systems on process stability (mean and median)
      3. Alerting systems on constant variability (std and variance)
      4. Basic alerting systems using statistical process control
    7. Summary
    8. Further reading
  10. Part 2: Exploring Use Cases for Data Streaming
  11. Chapter 4: Online Learning with River
    1. Technical requirements
      1. Python environment
    2. What is online machine learning?
      1. How is online learning different from regular learning?
      2. Advantages of online learning
      3. Challenges of online learning
      4. Types of online learning
    3. Using River for online learning
      1. Training an online model with River
      2. Improving the model evaluation
      3. Building a multiclass classifier using one-vs-rest
    4. Summary
    5. Further reading
  12. Chapter 5: Online Anomaly Detection
    1. Technical requirements
      1. Python environment
    2. Defining anomaly detection
      1. Are outliers a problem?
    3. Exploring use cases of anomaly detection
      1. Fraud detection in financial institutions
      2. Anomaly detection on your log data
      3. Fault detection in manufacturing and production lines
      4. Hacking detection in computer networks (cyber security)
      5. Medical risks in health data
      6. Predictive maintenance and sensor data
    4. Comparing anomaly detection and imbalanced classification
      1. The problem of imbalanced data
      2. The F1 score
      3. SMOTE oversampling
      4. Anomaly detection versus classification
    5. Algorithms for detecting anomalies in River
      1. The use of thresholders in River anomaly detection
      2. Anomaly detection algorithm 1 – One-Class SVM
      3. Anomaly detection algorithm 2 – Half-Space-Trees
    6. Going further with anomaly detection
    7. Summary
    8. Further reading
  13. Chapter 6: Online Classification
    1. Technical requirements
      1. Python environment
    2. Defining classification
    3. Identifying use cases of classification
      1. Use case 1 – email spam classification
      2. Use case 2 – face detection in phone camera
      3. Use case 3 – online marketing ad selection
    4. Overview of classification algorithms in River
      1. Classification algorithm 1 – LogisticRegression
      2. Classification algorithm 2 – Perceptron
      3. Classification algorithm 3 – AdaptiveRandomForestClassifier
      4. Classification algorithm 4 – ALMAClassifier
      5. Classification algorithm 5 – PAClassifier
      6. Evaluating benchmark results
    5. Summary
    6. Further reading
  14. Chapter 7: Online Regression
    1. Technical requirements
      1. Python environment
    2. Defining regression
    3. Use cases of regression
      1. Use case 1 – Forecasting
      2. Use case 2 – Predicting the number of faulty products in manufacturing
    4. Overview of regression algorithms in River
      1. Regression algorithm 1 – LinearRegression
      2. Regression algorithm 2 – HoeffdingAdaptiveTreeRegressor
      3. Regression algorithm 3 – SGTRegressor
      4. Regression algorithm 4 – SRPRegressor
    5. Summary
    6. Further reading
  15. Chapter 8: Reinforcement Learning
    1. Technical requirements
      1. Python environment
    2. Defining reinforcement learning
      1. Comparing online and offline reinforcement learning
      2. A more detailed overview of feedback loops in reinforcement learning
    3. The main steps of a reinforcement learning model
      1. Making the decisions
      2. Updating the decision rules
    4. Exploring Q-learning
      1. The goal of Q-learning
      2. Parameters of the Q-learning algorithm
    5. Deep Q-learning
    6. Using reinforcement learning for streaming data
    7. Use cases of reinforcement learning
      1. Use case one – trading system
      2. Use case two – social network ranking system
      3. Use case three – a self-driving car
      4. Use case four – chatbots
      5. Use case five – learning games
    8. Implementing reinforcement learning in Python
    9. Summary
    10. Further reading
  16. Part 3: Advanced Concepts and Best Practices around Streaming Data
  17. Chapter 9: Drift and Drift Detection
    1. Technical requirements
      1. Python environment
    2. Defining drift
      1. Three types of drift
    3. Introducing model explicability
    4. Measuring drift
      1. Measuring data drift
      2. Measuring concept drift
    5. Measuring drift in Python
      1. A basic intuitive approach to measuring drift
      2. Measuring drift with robust tools
    6. Counteracting drift
      1. Offline learning with retraining strategies against drift
      2. Online learning against drift
    7. Summary
    8. Further reading
  18. Chapter 10: Feature Transformation and Scaling
    1. Technical requirements
      1. Python environment
    2. Challenges of data preparation with streaming data
    3. Scaling data for streaming
      1. Introducing scaling
      2. Adapting scaling to a streaming context
    4. Transforming features in a streaming context
      1. Introducing PCA
      2. Mathematical definition of PCA
      3. Regular PCA in Python
      4. Incremental PCA for streaming
    5. Summary
    6. Further reading
  19. Chapter 11: Catastrophic Forgetting
    1. Technical requirements
      1. Python environment
    2. Introducing catastrophic forgetting
    3. Catastrophic forgetting in online models
    4. Detecting catastrophic forgetting
      1. Using Python to detect catastrophic forgetting
    5. Model explicability versus catastrophic forgetting
      1. Explaining models using linear coefficients
      2. Explaining models using dendrograms
      3. Explaining models using variable importance
    6. Summary
    7. Further reading
  20. Chapter 12: Conclusion and Best Practices
    1. Going further
    2. Summary
    3. Why subscribe?
  21. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts

Product information

  • Title: Machine Learning for Streaming Data with Python
  • Author(s): Joos Korstanje
  • Release date: July 2022
  • Publisher(s): Packt Publishing
  • ISBN: 9781803248363