3

Intermediate Data Processing

The previous chapter covered a suite of commonly used functions offered by dplyr for data processing. For example, when characterizing and extracting the statistics of a dataset, we can follow the split-apply-combine procedure using group_by() and summarize(). This chapter continues from the previous one and focuses on intermediate data processing techniques, including transforming categorical and numeric variables and reshaping DataFrames. Besides that, we will also introduce string manipulation techniques for working with textual data, whose format is fundamentally different from the neatly shaped tables we have been working with so far.

By the end of this chapter, you will be able to perform more advanced data ...

Get The Statistics and Machine Learning with R Workshop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.