Skip to Content
Practical Predictive Analytics
book

Practical Predictive Analytics

by Ralph Winters
June 2017
Beginner to intermediate
576 pages
15h 22m
English
Packt Publishing
Content preview from Practical Predictive Analytics

Picking out some potential outliers using a third query

Now we will construct a third query that will extract all records that may be considered outliers. For this example, we will define an outlier as any record that has age or pressure greater or less than 1.5 standard deviations below the mean for their outcome class. This is accomplished by joining our detail-level data with the aggregated means for age and pressure:

  • We can also compute a new column, agediff, which is the difference between age and average age.

  • We add limit=1000 as a protective filter, so that we retrieve more than the number of results. Placing limits on SQL queries tends to speed up result processing. In this case one record is returned:

 anomolies <- SparkR::sql("select ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Superstream: Analytics Engineering

Data Superstream: Analytics Engineering

Alistair Croll, Anna Filippova, Emilie Schario, Lewis Davies, Jacob Frackson, Benn Stancil, Nick Acosta, Elizabeth Caley
R: Predictive Analysis

R: Predictive Analysis

Tony Fischetti, Eric Mayor, Rui Miguel Forte
Python: Advanced Predictive Analytics

Python: Advanced Predictive Analytics

Ashish Kumar, Joseph Babcock

Publisher Resources

ISBN: 9781785886188Supplemental Content