O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Extracting a subset

Another typical task that you might want to perform is to first extract a particular characteristic of the data (such as patient "Expired"), and then perform a similar grouping to try to understand where the differences are. In this next example, we will also use the dplyr package to first extract those patients admitted to a hospital who died, and then summarize the TotalCosts by each of the major diagnostic classifications. Finally, we will order the costs by the most expensive diagnoses. As you can see from the results, infectious diseases have the highest costs associated with them:

df %>% filter(as.character(Patient.Disposition) == "Expired") %>%  group_by(APR.MDC.Description) %>%  summarize(total.count=n(),TotalCosts=sum(Total.Costs)) ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required