O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data Mining, 4th Edition

Book Description

Data Mining: Practical Machine Learning Tools and Techniques, Fourth Edition, offers a thorough grounding in machine learning concepts, along with practical advice on applying these tools and techniques in real-world data mining situations. This highly anticipated fourth edition of the most acclaimed work on data mining and machine learning teaches readers everything they need to know to get going, from preparing inputs, interpreting outputs, evaluating results, to the algorithmic methods at the heart of successful data mining approaches.

Extensive updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including substantial new chapters on probabilistic methods and on deep learning. Accompanying the book is a new version of the popular WEKA machine learning software from the University of Waikato. Authors Witten, Frank, Hall, and Pal include today's techniques coupled with the methods at the leading edge of contemporary research.

Please visit the book companion website at  http://www.cs.waikato.ac.nz/ml/weka/book.html

It contains

  • Powerpoint slides for Chapters 1-12. This is a very comprehensive teaching resource, with many PPT slides covering each chapter of the book
  • Online Appendix on the Weka workbench; again a very comprehensive learning aid for the open source software that goes with the book
  • Table of contents, highlighting the many new sections in the 4th edition, along with reviews of the 1st edition, errata, etc.
  • Provides a thorough grounding in machine learning concepts, as well as practical advice on applying the tools and techniques to data mining projects
  • Presents concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods
  • Includes a downloadable WEKA software toolkit, a comprehensive collection of machine learning algorithms for data mining tasks-in an easy-to-use interactive interface
  • Includes open-access online courses that introduce practical applications of the material in the book

Table of Contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Copyright
  5. List of Figures
  6. List of Tables
  7. Preface
    1. Updated and Revised Content
    2. Acknowledgments
  8. Part I: Introduction to data mining
    1. Chapter 1. What’s it all about?
      1. Abstract
      2. 1.1 Data Mining and Machine Learning
      3. 1.2 Simple Examples: The Weather Problem and Others
      4. 1.3 Fielded Applications
      5. 1.4 The Data Mining Process
      6. 1.5 Machine Learning and Statistics
      7. 1.6 Generalization as Search
      8. 1.7 Data Mining and Ethics
      9. 1.8 Further Reading and Bibliographic Notes
    2. Chapter 2. Input: Concepts, instances, attributes
      1. Abstract
      2. 2.1 What’s a Concept?
      3. 2.2 What’s in an Example?
      4. 2.3 What’s in an Attribute?
      5. 2.4 Preparing the Input
      6. 2.5 Further Reading and Bibliographic Notes
    3. Chapter 3. Output: Knowledge representation
      1. Abstract
      2. 3.1 Tables
      3. 3.2 Linear Models
      4. 3.3 Trees
      5. 3.4 Rules
      6. 3.5 Instance-Based Representation
      7. 3.6 Clusters
      8. 3.7 Further Reading and Bibliographic Notes
    4. Chapter 4. Algorithms: The basic methods
      1. Abstracts
      2. 4.1 Inferring Rudimentary Rules
      3. 4.2 Simple Probabilistic Modeling
      4. 4.3 Divide-and-Conquer: Constructing Decision Trees
      5. 4.4 Covering Algorithms: Constructing Rules
      6. 4.5 Mining Association Rules
      7. 4.6 Linear Models
      8. 4.7 Instance-Based Learning
      9. 4.8 Clustering
      10. 4.9 Multi-instance Learning
      11. 4.10 Further Reading and Bibliographic Notes
      12. 4.11 Weka Implementations
    5. Chapter 5. Credibility: Evaluating what’s been learned
      1. Abstract
      2. 5.1 Training and Testing
      3. 5.2 Predicting Performance
      4. 5.3 Cross-Validation
      5. 5.4 Other Estimates
      6. 5.5 Hyperparameter Selection
      7. 5.6 Comparing Data Mining Schemes
      8. 5.7 Predicting Probabilities
      9. 5.8 Counting the Cost
      10. 5.9 Evaluating Numeric Prediction
      11. 5.10 The MDL Principle
      12. 5.11 Applying the MDL Principle to Clustering
      13. 5.12 Using a Validation Set for Model Selection
      14. 5.13 Further Reading and Bibliographic Notes
  9. Part II: More advanced machine learning schemes
    1. Part II. More advanced machine learning schemes
    2. Chapter 6. Trees and rules
      1. Abstract
      2. 6.1 Decision Trees
      3. 6.2 Classification Rules
      4. 6.3 Association Rules
      5. 6.4 Weka Implementations
    3. Chapter 7. Extending instance-based and linear models
      1. Abstract
      2. 7.1 Instance-Based Learning
      3. 7.2 Extending Linear Models
      4. 7.3 Numeric Prediction With Local Linear Models
      5. 7.4 Weka Implementations
    4. Chapter 8. Data transformations
      1. Abstracts
      2. 8.1 Attribute Selection
      3. 8.2 Discretizing Numeric Attributes
      4. 8.3 Projections
      5. 8.4 Sampling
      6. 8.5 Cleansing
      7. 8.6 Transforming Multiple Classes to Binary Ones
      8. 8.7 Calibrating Class Probabilities
      9. 8.8 Further Reading and Bibliographic Notes
      10. 8.9 Weka Implementations
    5. Chapter 9. Probabilistic methods
      1. Abstract
      2. 9.1 Foundations
      3. 9.2 Bayesian Networks
      4. 9.3 Clustering and Probability Density Estimation
      5. 9.4 Hidden Variable Models
      6. 9.5 Bayesian Estimation and Prediction
      7. 9.6 Graphical Models and Factor Graphs
      8. 9.7 Conditional Probability Models
      9. 9.8 Sequential and Temporal Models
      10. 9.9 Further Reading and Bibliographic Notes
      11. 9.10 Weka Implementations
    6. Chapter 10. Deep learning
      1. Abstract
      2. 10.1 Deep Feedforward Networks
      3. 10.2 Training and Evaluating Deep Networks
      4. 10.3 Convolutional Neural Networks
      5. 10.4 Autoencoders
      6. 10.5 Stochastic Deep Networks
      7. 10.6 Recurrent Neural Networks
      8. 10.7 Further Reading and Bibliographic Notes
      9. 10.8 Deep Learning Software and Network Implementations
      10. 10.9 WEKA Implementations
    7. Chapter 11. Beyond supervised and unsupervised learning
      1. Abstract
      2. 11.1 Semisupervised Learning
      3. 11.2 Multi-instance Learning
      4. 11.3 Further Reading and Bibliographic Notes
      5. 11.4 WEKA Implementations
    8. Chapter 12. Ensemble learning
      1. Abstract
      2. 12.1 Combining Multiple Models
      3. 12.2 Bagging
      4. 12.3 Randomization
      5. 12.4 Boosting
      6. 12.5 Additive Regression
      7. 12.6 Interpretable Ensembles
      8. 12.7 Stacking
      9. 12.8 Further Reading and Bibliographic Notes
      10. 12.9 WEKA Implementations
    9. Chapter 13. Moving on: applications and beyond
      1. Abstract
      2. 13.1 Applying Machine Learning
      3. 13.2 Learning From Massive Datasets
      4. 13.3 Data Stream Learning
      5. 13.4 Incorporating Domain Knowledge
      6. 13.5 Text Mining
      7. 13.6 Web Mining
      8. 13.7 Images and Speech
      9. 13.8 Adversarial Situations
      10. 13.9 Ubiquitous Data Mining
      11. 13.10 Further Reading and Bibliographic Notes
      12. 13.11 WEKA Implementations
  10. Appendix A. Theoretical foundations
    1. A.1 Matrix Algebra
    2. A.2 Fundamental Elements of Probabilistic Methods
  11. Appendix B. The WEKA workbench
    1. B.1 What’s in WEKA?
    2. B.2 The package management system
    3. B.3 The Explorer
    4. B.4 The Knowledge Flow Interface
    5. B.5 The Experimenter
  12. References
  13. Index