CHAPTER 1INTRODUCTION TO BIG DEPENDENT DATA

Big data are common nowadays everywhere. Statistical methods capable of extracting useful information embedded in those data, including machine learning and artificial intelligence, have attracted much interest among researchers and practitioners in recent years. Most of the available statistical methods for analyzing big data were developed under the assumption that the observations are from independent samples. See, for instance, most methods discussed in Bühlmann and van de Geer (2011). Observations of big data, however, are dependent in many applications. The dependence may occur in the order by which the data were taken (such as time series data) or in space by which the sampling units reside (such as spatial data). Monthly civilian unemployment rates (16 years and older) of the 50 states in the United States is an example. Unemployment rates tend to be sticky over time and geographically neighboring states may share similar industries and, hence, have similar unemployment patterns. For dependent data, the spatial and/or temporal dependence is often the focus of statistical analysis. Consequently, there is a need to study analysis of big dependent data.

The main focus of this book is to provide readers a comprehensive treatment of statistical methods that can be used to analyze big dependent data. We start with some examples and simple methods for their descriptive analysis. More sophisticated methods will be introduced in other ...

Get Statistical Learning for Big Dependent Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.