June 2017
Beginner to intermediate
576 pages
15h 22m
English
Following an initial inspection of the data, it is a good idea to look at various summary statistics of the target variable broken down by some of the categories (or factors). We could do this using SQL; however, for this example we will use a useful package called dplyr, which has syntax that is SQL-like, and it should be easy for anyone familiar with SQL and/or Linux to pick up.
One of our goals is to break down the Total.Costs by some of the factors to see if we can see any differences in costs among the levels. Let's start with something easy, by breaking out these Total.Costs by the day of the week. We will do this by piping the df dataframe to the dplyr group by command, which will then send it to a ...