Data transformation and discretization
As we know from the previous section, there are always some data formats that are best suited for specific data mining algorithms. Data transformation is an approach to transform the original data to preferable data format for the input of certain data mining algorithms before the processing.
Data transformation
Data transformation routines convert the data into appropriate forms for mining. They're shown as follows:
- Smoothing: This uses binning, regression, and clustering to remove noise from the data
- Attribute construction: In this routine, new attributes are constructed and added from the given set of attributes
- Aggregation: In this summary or aggregation, operations are performed on the data
- Normalization
Get R: Data Analysis and Visualization now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.