5Numerical Summary and Groupby Analysis

We use numerical summary methods to see measures of central tendency (mean, median(and dispersion (variance, standard deviation) as well as other measures (skewness, kurtosis) and distributions (max,min, interquartile range) to find estimated and expected values for analysis of numerical variables. We use groupby to slice and dice the data for further analysis. In R we have the concept of tidy data:

  1. Each variable forms a column.
  2. Each observation forms a row.
  3. Each type of observational unit forms a table

By default all datasets in SAS are tidy.

Also in R we have the concept of Split‐Apply‐Combine. Split Apply Combine refers to the following

  1. split data into pieces,
  2. apply some function to each piece,
  3. combine the results back together again

5.1 Numerical Summary and Groupby Analysis

Here we use different functions and procedures to do analysis on numerical data.

5.2 Numerical Summary and Groupby Analysis in SAS

In SAS – we use the class statement for grouping and var statement for specifying for a particular value‐

  • proc means,

Proc means is one of the most common procedure used for analyzing the data. It calculates descriptive statistics like Mean, Standard Deviation, Maximum and Minimum along with Number of observations. By default, proc means displays the default Statistics: It can be customized for other descriptive statistics.

Note sashelp library in SAS is like the default datasets library in R, as both give test datasets for ...

Get SAS for R Users now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.