O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

The Path to Predictive Analytics and Machine Learning

Book Description

In many companies today, discussions about predictive analytics and machine learning tend to overlook one critical component: implementation. This report will help you examine practical methods for building and deploying scalable, production-ready machine-learning applications. Leveraging machine-learning models in production, after all, separates revenue generation and cost savings from mere intellectual novelty.

Product specialists from MemSQL describe several real-time use cases, including "operational" applications, where machine-learning models automate decision-making processes, as well as "interactive" applications, where machine learning informs decisions made by humans. You’ll also explore modern data processing architectures and leading technologies available for data processing, analysis, and visualization.

With this report, you’ll find ways to:

  • Build real-time data pipelines
  • Process transactions and analytics in a single database
  • Create custom real-time dashboards
  • Redeploy batch models in real time
  • Build real-time machine learning applications
  • Prepare data pipelines for predictive analytics and machine learning
  • Apply predictive analytics to real-time challenges
  • Use techniques for predictive analytics in production
  • Move from machine learning to artificial intelligence

Table of Contents

  1. Introduction
    1. An Anthropological Perspective
  2. 1. Building Real-Time Data Pipelines
    1. Modern Technologies for Going Real-Time
      1. High-Throughput Messaging Systems
      2. Data Transformation
      3. Persistent Datastore
      4. Moving from Data Silos to Real-Time Data Pipelines
      5. The Enterprise Architecture Gap
      6. Real-Time Pipelines and Converged Processing
  3. 2. Processing Transactions and Analytics in a Single Database
    1. Hybrid Data Processing Requirements
    2. Benefits of a Hybrid Data System
      1. New Sources of Revenue
      2. Reducing Administrative and Development Overhead
    3. Data Persistence and Availability
      1. Data Durability
      2. Data Availability
      3. Data Backup
  4. 3. Dawn of the Real-Time Dashboard
    1. Choosing a BI Dashboard
    2. Real-Time Dashboard Examples
      1. Tableau
      2. Zoomdata
      3. Looker
    3. Building Custom Real-Time Dashboards
      1. Database Requirements for Real-Time Dashboards
  5. 4. Redeploying Batch Models in Real Time
    1. Batch Approaches to Machine Learning
    2. Moving to Real Time: A Race Against Time
    3. Manufacturing Example
    4. Original Batch Approach
    5. Real-Time Approach
    6. Technical Integration and Real-Time Scoring
    7. Immediate Benefits from Batch to Real-Time Learning
  6. 5. Applied Introduction to Machine Learning
    1. Supervised Learning
      1. Regression
      2. Classification
    2. Unsupervised Learning
      1. Cluster Analysis
      2. Anomaly Detection
  7. 6. Real-Time Machine Learning Applications
    1. Real-Time Applications of Supervised Learning
      1. Real-Time Scoring
      2. Fast Training and Retraining
    2. Unsupervised Learning
      1. Real-Time Anomaly Detection
      2. Real-Time Clustering
  8. 7. Preparing Data Pipelines for Predictive Analytics and Machine Learning
    1. Real-Time Feature Extraction
    2. Minimizing Data Movement
    3. Dimensionality Reduction
  9. 8. Predictive Analytics in Use
    1. Renewable Energy and Industrial IoT
    2. PowerStream: A Showcase Application of Predictive Analytics for Renewable Energy and IIoT
      1. PowerStream Software Architecture
      2. PowerStream Hardware Configuration
      3. PowerStream Application Introduction
      4. PowerStream Details
      5. Advantages of Spark Coupled with a Distributed, Relational, Memory-Optimized Database
    3. SQL Pushdown Details
    4. PowerStream at the Command Line
  10. 9. Techniques for Predictive Analytics in Production
    1. Real-Time Event Processing
      1. Structuring Semi-Structured Data
    2. Real-Time Data Transformations
      1. Feature Scaling
    3. Real-Time Decision Making
  11. 10. From Machine Learning to Artificial Intelligence
    1. Statistics at the Start
    2. The “Sample Data” Explosion
    3. An Iterative Machine Process
    4. Digging into Deep Learning
      1. Resource Management for Deep Learning
      2. Talent Evolution and Language Resurgence
    5. The Move to Artificial Intelligence
      1. The Intelligent Chatbot
      2. Broader Artificial Intelligence Functions
      3. The Long Road Ahead
  12. A. Appendix