2Simple Exploratory Data Analysis
Having read a dataset, the first activity usually made afterward is to figure out the main characteristics of those data and make sense of them. This means understanding the organization of the data, their types, and some initial information on their values. For data of numerical type, simple statistical information can be obtained; these are usually called descriptive statistics and often include basic information like the arithmetic mean, the median, maximum and minimum values, and quartiles. Clearly, other, more detailed statistical information could be easily obtained from a series of numerical values.
This activity is often called simple exploratory data analysis, where the adjective “simple” distinguishes this basic and quick analysis performed to grasp the main features of a dataset with respect to thorough exploratory data analyses performed with more sophisticated statistical tools and methods.
However, the few requirements of this initial approach to a new dataset should not be erroneously considered unimportant. On the contrary, basic descriptive statistics offered by common tools may reveal important features of a dataset that could help decide how to proceed, show the presence of anomalous values, or indicate specific data wrangling operations to execute. It is important to dedicate attention to the information provided by a simple exploratory data analysis. Real datasets are typically too large to be visually inspected; therefore, ...
Get Data Science Fundamentals with R, Python, and Open Data now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.