book

Machine Learning with Spark and Python, 2nd Edition

Name: Machine Learning with Spark and Python, 2nd Edition
Author: Michael Bowles
ISBN: 9781119561934

by Michael Bowles

November 2019

Intermediate to advanced

368 pages

9h 53m

English

Wiley

Read now

Unlock full access

Cover
Introduction
Who This Book Is ForWhat This Book CoversWhat Has Changed Since the First EditionHow This Book Is StructuredWhat You Need to Use This BookReader Support for This Book
CHAPTER 1: The Two Essential Algorithms for Making Predictions
Why Are These Two Algorithms So Useful?What Are Penalized Regression Methods?What Are Ensemble Methods?How to Decide Which Algorithm to UseThe Process Steps for Building a Predictive ModelChapter Contents and DependenciesSummaryReferences
CHAPTER 2: Understand the Problem by Understanding the Data
The Anatomy of a New ProblemClassification Problems: Detecting Unexploded Mines Using SonarVisualizing Properties of the Rocks Versus Mines Data SetReal-Valued Predictions with Factor Variables: How Old Is Your Abalone?Real-Valued Predictions Using Real-Valued Attributes: Calculate How Your Wine TastesMulticlass Classification Problem: What Type of Glass Is That?Using PySpark to Understand Large Data SetsSummaryReference
CHAPTER 3: Predictive Model Building: Balancing Performance, Complexity, and Big Data
The Basic Problem: Understanding Function ApproximationFactors Driving Algorithm Choices and Performance—Complexity and DataMeasuring the Performance of Predictive ModelsAchieving Harmony between Model and DataUsing PySpark for Training Penalized Regression Models on Extremely Large Data SetsSummaryReference
CHAPTER 4: Penalized Linear Regression
Why Penalized Linear Regression Methods Are So UsefulPenalized Linear Regression: Regulating Linear Regression for Optimum PerformanceSolving the Penalized Linear Regression ProblemExtension of Linear Regression to Classification ProblemsSummaryReferences
CHAPTER 5: Building Predictive Models Using Penalized Linear Methods
Python Packages for Penalized Linear RegressionMultivariable Regression: Predicting Wine TasteBinary Classification: Using Penalized Linear Regression to Detect Unexploded MinesMulticlass Classification: Classifying Crime Scene Glass SamplesLinear Regression and Classification Using PySparkUsing PySpark to Predict Wine TasteLogistic Regression with PySpark: Rocks Versus MinesIncorporating Categorical Variables in a PySpark Model: Predicting Abalone RingsMulticlass Logistic Regression with Meta Parameter OptimizationSummaryReferences
CHAPTER 6: Ensemble Methods
Binary Decision TreesBootstrap Aggregation: “Bagging”Gradient BoostingRandom ForestsSummaryReferences
CHAPTER 7: Building Ensemble Models with Python
Solving Regression Problems with Python Ensemble PackagesIncorporating Non-Numeric Attributes in Python Ensemble ModelsSolving Binary Classification Problems with Python Ensemble MethodsSolving Multiclass Classification Problems with Python Ensemble MethodsSolving Regression Problems with PySpark Ensemble PackagesSummaryReferences
Index

End User License Agreement

Content preview from Machine Learning with Spark and Python, 2nd Edition

CHAPTER 4Penalized Linear Regression

As you saw in Chapter 3, “Predictive Model Building: Balancing Performance, Complexity, and Big Data,” getting linear regression to work in practice requires some manipulation of the ordinary least squares algorithm. Ordinary least squares regression cannot temper its use of all the data available in an attempt to minimize the error on the training data. Chapter 3 illustrated that this situation can lead to models that perform much worse on test data than on the training data. Chapter 3 showed two extensions of ordinary least squares regression: forward stepwise regression and ridge regression. Both of these involved judiciously reducing the amount of data available to ordinary least squares and using out-of-sample error measurement to determine how much data resulted in the best performance.

Stepwise regression began by letting ordinary least squares regression use exactly one of the attribute columns for making predictions and by picking the best one. It proceeded by recursively adding a single additional column of attributes to those already being used in the model.

Ridge regression introduced a different type of constraint. Ridge regression imposed a penalty on the magnitude of the coefficients to constrict the solution. Both ridge regression and forward stepwise regression gave better than ordinary least squares (OLS) on example problems.

This chapter develops an extended family of methods for controlling the overfitting inherent in ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Machine Learning with Spark - Second Edition

Publisher Resources

ISBN: 9781119561934Purchase book

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Machine Learning with Spark and Python, 2nd Edition

by Michael Bowles

CHAPTER 4Penalized Linear Regression

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.