Summarizing Functions
Often, you are provided with data that is too fine grained for your analysis. For example, you might be analyzing data about a website. Suppose that you wanted to know the average number of pages delivered to each user. To find the answer, you might need to look at every HTTP transaction (every request for content), grouping together requests into sessions and counting the number of requests. R provides a number of different functions for summarizing data, aggregating records together to build a smaller data set.
tapply, aggregate
The tapply
function is
a very flexible function for summarizing a vector X
. You can specify which subsets of
X
to summarize as well as the
function used for summarization:
tapply(X, INDEX, FUN = , ..., simplify = )
Here are the arguments to tapply
.
Argument | Description | Default |
---|---|---|
X | The object on which to apply the function (usually a vector). | |
INDEX | A list of factors that specify different sets of
values of X over which to
calculate FUN , each the
same length as X . | |
FUN | The function applied to elements of X . | NULL |
... | Optional arguments are passed to FUN . | |
simplify | If simplify=TRUE ,
then if FUN returns a
scalar, then tapply
returns an array with the mode of the scalar. If simplify=FALSE , then tapply returns a list. | TRUE |
For example, we can use tapply
to sum the number of home runs by
team:
> tapply(X=batting.2008$HR,INDEX=list(batting.2008$teamID),FUN=sum) ARI ATL BAL BOS CHA CHN CIN CLE COL DET FLO HOU KCA LAA LAN MIL MIN 159 130 172 173 235 184 187 171 160 ...
Get R in a Nutshell now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.