book

The Path to Predictive Analytics and Machine Learning

by Conor Doherty, Steven Camina, Kevin White, Gary Orenstein

October 2016

Intermediate to advanced

87 pages

1h 50m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Introduction
An Anthropological Perspective
1. Building Real-Time Data Pipelines
Modern Technologies for Going Real-TimeHigh-Throughput Messaging SystemsData TransformationPersistent DatastoreMoving from Data Silos to Real-Time Data PipelinesThe Enterprise Architecture GapReal-Time Pipelines and Converged Processing
2. Processing Transactions and Analytics in a Single Database
Hybrid Data Processing RequirementsBenefits of a Hybrid Data SystemNew Sources of RevenueReducing Administrative and Development OverheadData Persistence and AvailabilityData DurabilityData AvailabilityData Backup
3. Dawn of the Real-Time Dashboard
Choosing a BI DashboardReal-Time Dashboard ExamplesTableauZoomdataLookerBuilding Custom Real-Time DashboardsDatabase Requirements for Real-Time Dashboards
4. Redeploying Batch Models in Real Time
Batch Approaches to Machine LearningMoving to Real Time: A Race Against TimeManufacturing ExampleOriginal Batch ApproachReal-Time ApproachTechnical Integration and Real-Time ScoringImmediate Benefits from Batch to Real-Time Learning
5. Applied Introduction to Machine Learning
Supervised LearningRegressionClassificationUnsupervised LearningCluster AnalysisAnomaly Detection
6. Real-Time Machine Learning Applications
Real-Time Applications of Supervised LearningReal-Time ScoringFast Training and RetrainingUnsupervised LearningReal-Time Anomaly DetectionReal-Time Clustering
7. Preparing Data Pipelines for Predictive Analytics and Machine Learning
Real-Time Feature ExtractionMinimizing Data MovementDimensionality Reduction
8. Predictive Analytics in Use
Renewable Energy and Industrial IoTPowerStream: A Showcase Application of Predictive Analytics for Renewable Energy and IIoTPowerStream Software ArchitecturePowerStream Hardware ConfigurationPowerStream Application IntroductionPowerStream DetailsAdvantages of Spark Coupled with a Distributed, Relational, Memory-Optimized DatabaseSQL Pushdown DetailsPowerStream at the Command Line
9. Techniques for Predictive Analytics in Production
Real-Time Event ProcessingStructuring Semi-Structured DataReal-Time Data TransformationsFeature ScalingReal-Time Decision Making

10. From Machine Learning to Artificial Intelligence
Statistics at the StartThe “Sample Data” ExplosionAn Iterative Machine ProcessDigging into Deep LearningResource Management for Deep LearningTalent Evolution and Language ResurgenceThe Move to Artificial IntelligenceThe Intelligent ChatbotBroader Artificial Intelligence FunctionsThe Long Road Ahead
A. Appendix

Content preview from The Path to Predictive Analytics and Machine Learning

Chapter 1. Building Real-Time Data Pipelines

Discussions of predictive analytics and machine learning often gloss over the details of a difficult but crucial component of success in business: implementation. The ability to use machine learning models in production is what separates revenue generation and cost savings from mere intellectual novelty. In addition to providing an overview of the theoretical foundations of machine learning, this book discusses pragmatic concerns related to building and deploying scalable, production-ready machine learning applications. There is a heavy focus on real-time uses cases including both operational applications, for which a machine learning model is used to automate a decision-making process, and interactive applications, for which machine learning informs a decision made by a human.

Given the focus of this book on implementing and deploying predictive analytics applications, it is important to establish context around the technologies and architectures that will be used in production. In addition to the theoretical advantages and limitations of particular techniques, business decision makers need an understanding of the systems in which machine learning applications will be deployed. The interactive tools used by data scientists to develop models, including domain-specific languages like R, in general do not suit low-latency production environments. Deploying models in production forces businesses to consider factors like model training ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Big Data Analytics with Spark: A Practitioner’s Guide to Using Spark for Large-Scale Data Processing, Machine Learning, and Graph Analytics, and High-Velocity Data Stream Processing

Mohammed Guller

Predictive Analytics, Revised and Updated

Publisher Resources

ISBN: 9781492042884

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

The Path to Predictive Analytics and Machine Learning

by Conor Doherty, Steven Camina, Kevin White, Gary Orenstein

Chapter 1. Building Real-Time Data Pipelines

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.