book

Predictive Analytics and Data Mining

by Vijay Kotu, Bala Deshpande

November 2014

Beginner to intermediate

446 pages

12h 16m

English

Morgan Kaufmann

Read now

Unlock full access

Cover image
Title page
Table of Contents
Copyright
Dedication
Foreword
Preface
Acknowledgments
Chapter 1. Introduction
1.1. What Data Mining Is1.2. What Data Mining is Not1.3. The Case for Data Mining1.4. Types of Data Mining1.5. Data Mining Algorithms1.6. Roadmap for Upcoming Chapters
Chapter 2. Data Mining Process
2.1. Prior Knowledge2.2. Data Preparation2.3. Modeling2.4. Application2.5. KnowledgeWhat’s Next?

Chapter 3. Data Exploration
3.1. Objectives of Data Exploration3.2. Data Sets3.3. Descriptive Statistics3.4. Data Visualization3.5. Roadmap for Data Exploration
Chapter 4. Classification
4.1. Decision Trees
4.2. Rule Induction
4.3. k-Nearest Neighbors
4.4. Naïve Bayesian
4.5. Artificial Neural Networks
4.6. Support Vector Machines
4.7. Ensemble Learners
Chapter 5. Regression Methods
5.1. Linear Regression5.2. Logistic RegressionConclusion
Chapter 6. Association Analysis
6.1. Concepts of Mining Association Rules6.2. Apriori Algorithm6.3. FP-Growth AlgorithmConclusion
Chapter 7. Clustering
Clustering to Describe the DataClustering for Preprocessing7.1. Types of Clustering Techniques7.2. k-Means Clustering7.3. DBSCAN Clustering
Chapter 8. Model Evaluation
8.1. Confusion Matrix (or Truth Table)8.2. Receiver Operator Characteristic (ROC) Curves and Area under the Curve (AUC)8.3. Lift Curves8.4. Evaluating The Predictions: ImplementationConclusion
Chapter 9. Text Mining
9.1. How Text Mining Works9.2. Implementing Text Mining with Clustering and ClassificationConclusion
Chapter 10. Time Series Forecasting
10.1. Data-Driven Approaches10.2. Model-Driven Forecasting MethodsConclusion
Chapter 11. Anomaly Detection
11.1. Anomaly Detection Concepts11.2. Distance-Based Outlier Detection11.3. Density-Based Outlier Detection11.4. Local Outlier FactorConclusion
Chapter 12. Feature Selection
12.1. Classifying Feature Selection Methods12.2. Principal Component Analysis12.3. Information Theory–Based Filtering for Numeric Data12.4. Chi-Square-Based Filtering for Categorical Data12.5. Wrapper-Type Feature SelectionConclusion
Chapter 13. Getting Started with RapidMiner
13.1. User Interface and Terminology13.2. Data Importing and Exporting Tools13.3. Data Visualization Tools13.4. Data Transformation Tools13.5. Sampling and Missing Value Tools13.6. Optimization ToolsConclusion
Comparison of Data Mining Algorithms
Index
About the Authors

Content preview from Predictive Analytics and Data Mining

Chapter 7

Clustering

Abstract

Clustering is an unsupervised data mining technique where the records in a data set are organized into different logical groupings. The groupings are in such a way that records inside the same group are more similar than records outside the group. Clustering has a wide variety of applications ranging from market segmentation to customer segmentation, electoral grouping, web analytics, and outlier detection. Clustering is also used as a data compression technique and data preprocessing technique for supervised data mining tasks. Many different data mining approaches are available to cluster the data and are developed based on proximity between the records, density in the data set, or novel application of neural networks. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Data Mining and Predictive Analytics, 2nd Edition

Publisher Resources

ISBN: 9780128014608

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Predictive Analytics and Data Mining

by Vijay Kotu, Bala Deshpande

Clustering

Abstract

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.