The approach to data science solutions involves the following staged approach:
Knowledge mining (discovery) in datasets is an interactive and iterative process involving several steps for identifying valid, useful, and understandable patterns in data.
- Data processing: Data cleansing pre-transforms the raw data into an easy and convenient format for usage; a few related tasks are:
- Sampling, that is, selecting representative subsets from a large population of data
- Remove noise
- Missing data handling for incomplete rows
- Normalization of data
- Feature extraction: data useful in a particular context is extracted
- Data ...