Skip to Content
Discovering Knowledge in Data: An Introduction to Data Mining, 2nd Edition
book

Discovering Knowledge in Data: An Introduction to Data Mining, 2nd Edition

by Daniel T. Larose
July 2014
Beginner to intermediate
336 pages
9h 30m
English
Wiley
Content preview from Discovering Knowledge in Data: An Introduction to Data Mining, 2nd Edition

Chapter 6Preparing to Model the Data

  1. 6.1 Supervised Versus Unsupervised Methods
  2. 6.2 Statistical Methodology and Data Mining Methodology
  3. 6.3 Cross-Validation
  4. 6.4 Overfitting
  5. 6.5 BIAS–Variance Trade-Off
  6. 6.6 Balancing the Training Data Set
  7. 6.7 Establishing Baseline Performance
    1. The R Zone
    2. Reference
    3. Exercises

6.1 Supervised Versus Unsupervised Methods

Data mining methods may be categorized as either supervised or unsupervised. In unsupervised methods, no target variable is identified as such. Instead, the data mining algorithm searches for patterns and structure among all the variables. The most common unsupervised data mining method is clustering, our topic for Chapters 10 and 11. For example, political consultants may analyze congressional districts using clustering methods, to uncover the locations of voter clusters that may be responsive to a particular candidate's message. In this case, all appropriate variables (e.g., income, race, gender) would be input to the clustering algorithm, with no target variable specified, in order to develop accurate voter profiles for fund-raising and advertising purposes.

Another data mining method, which may be supervised or unsupervised, is association rule mining. In market basket analysis, for example, one may simply be interested in “which items are purchased together,” in which case no target variable would be identified. The problem here, of course, is that there are so many items for sale, that searching for all possible associations ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Making Sense of Data I: A Practical Guide to Exploratory Data Analysis and Data Mining, 2nd Edition

Making Sense of Data I: A Practical Guide to Exploratory Data Analysis and Data Mining, 2nd Edition

Glenn J. Myatt, Wayne P. Johnson
Data Mining for Business Analytics

Data Mining for Business Analytics

Galit Shmueli, Peter C. Bruce, Peter Gedeck, Nitin R. Patel
Data Mining, 4th Edition

Data Mining, 4th Edition

Ian H. Witten, Eibe Frank, Mark A. Hall, Christopher J. Pal

Publisher Resources

ISBN: 9781118873571Purchase book