book

Predictive Analytics and Data Mining

by Vijay Kotu, Bala Deshpande

November 2014

Beginner to intermediate

446 pages

12h 16m

English

Morgan Kaufmann

Read now

Unlock full access

Cover image
Title page
Table of Contents
Copyright
Dedication
Foreword
Preface
Acknowledgments
Chapter 1. Introduction
1.1. What Data Mining Is1.2. What Data Mining is Not1.3. The Case for Data Mining1.4. Types of Data Mining1.5. Data Mining Algorithms1.6. Roadmap for Upcoming Chapters
Chapter 2. Data Mining Process
2.1. Prior Knowledge2.2. Data Preparation2.3. Modeling2.4. Application2.5. KnowledgeWhat’s Next?

Chapter 3. Data Exploration
3.1. Objectives of Data Exploration3.2. Data Sets3.3. Descriptive Statistics3.4. Data Visualization3.5. Roadmap for Data Exploration
Chapter 4. Classification
4.1. Decision Trees
4.2. Rule Induction
4.3. k-Nearest Neighbors
4.4. Naïve Bayesian
4.5. Artificial Neural Networks
4.6. Support Vector Machines
4.7. Ensemble Learners
Chapter 5. Regression Methods
5.1. Linear Regression5.2. Logistic RegressionConclusion
Chapter 6. Association Analysis
6.1. Concepts of Mining Association Rules6.2. Apriori Algorithm6.3. FP-Growth AlgorithmConclusion
Chapter 7. Clustering
Clustering to Describe the DataClustering for Preprocessing7.1. Types of Clustering Techniques7.2. k-Means Clustering7.3. DBSCAN Clustering
Chapter 8. Model Evaluation
8.1. Confusion Matrix (or Truth Table)8.2. Receiver Operator Characteristic (ROC) Curves and Area under the Curve (AUC)8.3. Lift Curves8.4. Evaluating The Predictions: ImplementationConclusion
Chapter 9. Text Mining
9.1. How Text Mining Works9.2. Implementing Text Mining with Clustering and ClassificationConclusion
Chapter 10. Time Series Forecasting
10.1. Data-Driven Approaches10.2. Model-Driven Forecasting MethodsConclusion
Chapter 11. Anomaly Detection
11.1. Anomaly Detection Concepts11.2. Distance-Based Outlier Detection11.3. Density-Based Outlier Detection11.4. Local Outlier FactorConclusion
Chapter 12. Feature Selection
12.1. Classifying Feature Selection Methods12.2. Principal Component Analysis12.3. Information Theory–Based Filtering for Numeric Data12.4. Chi-Square-Based Filtering for Categorical Data12.5. Wrapper-Type Feature SelectionConclusion
Chapter 13. Getting Started with RapidMiner
13.1. User Interface and Terminology13.2. Data Importing and Exporting Tools13.3. Data Visualization Tools13.4. Data Transformation Tools13.5. Sampling and Missing Value Tools13.6. Optimization ToolsConclusion
Comparison of Data Mining Algorithms
Index
About the Authors

Content preview from Predictive Analytics and Data Mining

Chapter 8

Model Evaluation

Abstract

This chapter describes three commonly used tools for evaluating the performance of a classification algorithm. We first introduce the confusion matrix and provide the definitions for several terms that are used in conjunction, such as sensitivity, specificity, recall, etc. We then describe how to construct receiver operating characteristic (ROC) curves and show when it would be appropriate to use them along with the area under the curve (AUC) concept. Finally we present lift and gain charts, and show how to construct and interpret them. The RapidMiner implementation includes step-by-step processes for building each of these three very useful evaluation tools.

Keywords

Model evaluation; classification performance; ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Data Mining and Predictive Analytics, 2nd Edition

Publisher Resources

ISBN: 9780128014608

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Predictive Analytics and Data Mining

by Vijay Kotu, Bala Deshpande

Model Evaluation