2Summarizing Statistical Data

In this chapter, we explore some of the procedures available in R to summarize statistical data, and we give some examples of writing programs.

2.1 Measures of Central Tendency

Measures of central tendency are typical or central points in the data. The most commonly used are the mean and the median.

Mean: The mean is the sum of all values divided by the number of cases, excluding the missing values.

To obtain the mean of the data in Example 1.1 stored in images write

 mean(downtime) 
 [1] 25.04348 

So the average downtime of all the computers in the laboratory is just over 25 minutes.

Going back to the original data in Exercise 1.1 stored in marks, to obtain the mean, write

 mean(marks) 

which gives

 [1] 57.44 

To obtain the mean marks for females, write

 mean(marks[1:23])
[1] 65.86957 

For males,

 mean(marks[24:50])
[1] 50.25926 

illustrating that the female average is substantially higher than the male average.

To obtain the mean of the corrected data in Exercise 1.1, recall that the mark of 86 for the 34th student on the list was an error, and that it should have been 46. We changed it with

 marks[34] <- 46 

The new overall average is

 mean(marks)
56.64 

and the new male average is

 mean(marks[24:50])
[1] 48.77778 

increasing the gap between the male and female averages even further.

If we perform a similar operation for the variables in the examination data given ...

Get Probability with R, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.