O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data Mining, 2nd Edition

Book Description

Data Mining, Second Edition, describes data mining techniques and shows how they work. The book is a major revision of the first edition that appeared in 1999. While the basic core remains the same, it has been updated to reflect the changes that have taken place over five years, and now has nearly double the references.

The highlights of this new edition include thirty new technique sections; an enhanced Weka machine learning workbench, which now features an interactive interface; comprehensive information on neural networks; a new section on Bayesian networks; and much more.

This text is designed for information systems practitioners, programmers, consultants, developers, information technology managers, specification writers as well as professors and students of graduate-level data mining and machine learning courses.

  • Algorithmic methods at the heart of successful data mining—including tried and true techniques as well as leading edge methods
  • Performance improvement techniques that work by transforming the input or output

Table of Contents

  1. Cover Image
  2. Content
  3. Title
  4. The Morgan Kaufmann Series in Data Management Systems
  5. Copyright
  6. Foreword
  7. List of Figures
  8. List of Tables
  9. Preface
  10. Updated and revised content
  11. Acknowledgments
  12. PART I: Machine learning tools and techniques
    1. Chapter 1. What's It All About?
      1. 1.1 Data mining and machine learning
      2. 1.2 Simple examples: The weather problem and others
      3. 1.3 Fielded applications
      4. 1.4 Machine learning and statistics
      5. 1.5 Generalization as search
      6. 1.6 Data mining and ethics
      7. 1.7 Further reading
    2. Chapter 2. Input: Concepts, Instances, and Attributes
      1. 2.1 What's a concept?
      2. 2.2 What's in an example?
      3. 2.3 What's in an attribute?
      4. 2.4 Preparing the input
      5. 2.5 Further reading
    3. Chapter 3. Output: Knowledge Representation
      1. 3.1 Decision tables
      2. 3.2 Decision trees
      3. 3.3 Classification rules
      4. 3.4 Association rules
      5. 3.5 Rules with exceptions
      6. 3.6 Rules involving relations
      7. 3.7 Trees for numeric prediction
      8. 3.8 Instance-based representation
      9. 3.9 Clusters
      10. 3.10 Further reading
    4. Chapter 4. Algorithms: The Basic Methods
      1. 4.1 Inferring rudimentary rules
      2. 4.2 Statistical modeling
      3. 4.3 Divide-and-conquer: Constructing decision trees
      4. 4.5 Mining association rules
      5. 4.6 Linear models
      6. 4.7 Instance-based learning
      7. 4.8 Clustering
      8. 4.9 Further reading
    5. Chapter 5. Credibility: Evaluating What's Been Learned
      1. 5.1 Training and testing
      2. 5.2 Predicting performance
      3. 5.3 Cross-validation
      4. 5.4 Other estimates
      5. 5.5 Comparing data mining methods
      6. 5.6 Predicting probabilities
      7. 5.7 Counting the cost
      8. 5.8 Evaluating numeric prediction
      9. 5.9 The minimum description length principle
      10. 5.10 Applying the MDL principle to clustering
      11. 5.11 Further reading
    6. Chapter 6. Implementations: Real Machine Learning Schemes
      1. 6.1 Decision trees
      2. 6.2 Classification rules
      3. 6.3 Extending linear models
      4. 6.4 Instance-based learning
      5. 6.5 Numeric prediction
      6. 6.6 Clustering
      7. 6.7 Bayesian networks
    7. Chapter 7. Transformations: Engineering the input and output
      1. 7.1 Attribute selection
      2. 7.2 Discretizing numeric attributes
      3. 7.3 Some useful transformations
      4. 7.4 Automatic data cleansing
      5. 7.5 Combining multiple models
      6. 7.6 Using unlabeled data
      7. 7.7 Further reading
    8. Chapter 8. Moving on: Extensions and Applications
      1. 8.1 Learning from massive datasets
      2. 8.2 Incorporating domain knowledge
      3. 8.3 Text and Web mining
      4. 8.4 Adversarial situations
      5. 8.5 Ubiquitous data mining
      6. 8.6 Further reading
  13. PART II: The Weka machine learning workbench
    1. Chapter 9. Introduction to Weka
      1. 9.1 What's in Weka?
      2. 9.2 How do you use it?
      3. 9.3 What else can you do?
      4. 9.4 How do you get it?
    2. Chapter 10. The Explorer
      1. 10.1 Getting started
      2. 10.2 Exploring the Explorer
      3. 10.3 Filtering algorithms
      4. 10.4 Learning algorithms
      5. 10.5 Metalearning algorithms
      6. 10.6 Clustering algorithms
      7. 10.7 Association-rule learners
      8. 10.8 Attribute selection
    3. Chapter 11. The Knowledge Flow Interface
      1. 11.1 Getting started
      2. 11.2 The Knowledge Flow components
      3. 11.3 Configuring and connecting the components
      4. 11.4 Incremental learning
    4. Chapter 12. The Experimenter
      1. 12.1 Getting started
      2. 12.2 Simple setup
      3. 12.3 Advanced setup
      4. 12.4 The Analyze panel
      5. 12.5 Distributing processing over several machines
    5. Chapter 13. The Command-line Interface
      1. 13.1 Getting started
      2. 13.2 The structure of Weka
      3. 13.3 Command-line options
    6. Chapter 14. Embedded Machine Learning
      1. 14.1 A simple data mining application
      2. 14.2 Going through the code
    7. Chapter 15. Writing New Learning Schemes
      1. 15.1 An example classifier
      2. 15.2 Conventions for implementing classifiers
  14. Index
  15. About the Authors