CHAPTER 8MACHINE LEARNING OF BIG DEPENDENT DATA

Classification and discriminant analysis are useful statistical tools with many applications, as shown in Chapter 5. They are widely used in both the traditional statistical analysis and modern machine learning. In some cases, they serve as simple statistical tools to leverage the modern computing power to process efficiently large datasets. In this chapter, we focus on statistical and machine learning methods that are useful not only in prediction, but also in classification and discriminant analysis. In the first part, tree-based methods are introduced including classification and regression tree (CART) and random forest (RF). These statistical methods are nonparametric in nature. They do not postulate any specific statistical model for the data under study. Instead, they use recursive partitioning to explore the structure hidden in the dataset. The CART has a long history in the statistical literature. See Breiman et al. (1984). With the advances in ensemble learnings, CART is further extended to RF by Breiman (2001). In recent years, there are many articles concerning Bayesian CART and Bayesian additive regression trees (BART). See, for instance, Chipman et al. (2010) and a recent improvement over BART by He et al. (2018).

Machine learning or artificial intelligence (AI) is popular nowadays. It leverages the availability of big data (mainly from Internet of Things), advances in optimization methods, and powerful computers ...

Get Statistical Learning for Big Dependent Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.