Chapter 11On Data Mining
The focus of this chapter is to introduce basic concepts and tools in Data Mining.
Topics Covered
- What is data mining and big data?
- Data reduction and visualization
- Data preparation
- Classification
- Classification and regression trees
Learning Outcomes
After studying this chapter, the reader will be able to
- Understand the need for data mining.
- Learn data reduction techniques.
- Use data visualization and data preparation methods.
- Classify data using appropriate tools.
- Construct both classification and regression trees.
- Use the R software to perform classification and regression tree analyses.
11.1 Introduction
Previously, we studied a variety of probability and statistics topics starting from the very basic level suitable for those with no previous knowledge up to an advanced level of various techniques of statistics. Specifically, in Chapter 2, we learnt to divide data into different classes or classify data into various classes, derive various descriptive statistics such as mean, variance, standard deviation, and others. We also learned to identify outliers that may be present in a set of data. Afterward, we focused on various distributions, in particular, the normal distribution, where we learnt how to standardize random variables. This led to a discussion on estimation and testing of hypothesis and some basics concepts in reliability theory. In Chapters 15 and 16, we will discuss regression analysis and logistic regression, where we will learn ...
Get Statistics and Probability with Applications for Engineers and Scientists Using MINITAB, R and JMP, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.