Publisher Summary
This chapter is about getting familiar with the data. Knowledge about the data is useful for data preprocessing, the first major task of the data mining process. The various attribute types are studied. These include nominal attributes, binary attributes, ordinal attributes, and numeric attributes. Basic statistical descriptions can be used to learn more about each attribute’s values. Given a temperature attribute, one can determine its mean (average value), median (middle value), and mode (most common value). These are measures of central tendency, which give us an idea of the “middle” or center of distribution. Knowing such basic statistics regarding each attribute makes it easier to fill in missing values, smooth noisy values, ...
Get Data Mining: Concepts and Techniques, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.