Skip to Content
Data Mining: Concepts and Techniques, 3rd Edition
book

Data Mining: Concepts and Techniques, 3rd Edition

by Jiawei Han, Micheline Kamber, Jian Pei
June 2011
Beginner to intermediate content levelBeginner to intermediate
744 pages
25h 11m
English
Morgan Kaufmann
Content preview from Data Mining: Concepts and Techniques, 3rd Edition

3.6 Summary

■ Data quality is defined in terms of accuracy, completeness, consistency, timeliness, believability, and interpretabilty. These qualities are assessed based on the intended use of the data.

■ Data cleaning routines attempt to fill in missing values, smooth out noise while identifying outliers, and correct inconsistencies in the data. Data cleaning is usually performed as an iterative two-step process consisting of discrepancy detection and data transformation.

■ Data integration combines data from multiple sources to form a coherent data store. The resolution of semantic heterogeneity, metadata, correlation analysis, tuple duplication detection, and data conflict detection contribute to smooth data integration.

■ Data reduction

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Practical Statistics for Data Scientists, 2nd Edition

Practical Statistics for Data Scientists, 2nd Edition

Peter Bruce, Andrew Bruce, Peter Gedeck
Fundamentals of Data Engineering

Fundamentals of Data Engineering

Joe Reis, Matt Housley
Fundamentals of Data Engineering

Fundamentals of Data Engineering

Joe Reis, Matt Housley

Publisher Resources

ISBN: 9780123814791