Analysts often spend 50-80% of their time preparing and transforming data sets before they begin more formal analysis work. This video tutorial shows you how to streamline your code—and your thinking—by introducing a set of principles and R packages that make this work much faster and easier. Garrett Grolemund, Data Scientist and Master Instructor at RStudio, demonstrates how R and its packages help you tackle three main issues:
- Data Manipulation. Data sets contain more information than they display. By transforming your data, you can reveal a wealth of descriptive statistics, group level observations, and hidden variables. R’s dplyr package provides optimized functions to help you transform data, as well as a pipe syntax that makes R code more concise and intuitive.
- Data Tidying. Data sets come in many formats, but R prefers just one. R runs quickly and intuitively when your data is stored in the tidy format, a layout that allows vectorized programming. R’s tidyr package reshapes the layout of your data sets, making them tidy while preserving the relationships they contain.
- Data Visualization. The structure of data visualizations parallels the structure of data sets. Once your data is tidy, visualizations become straightforward: each observation in your dataset becomes a mark on a graph, each variable becomes a visual property of the marks. The result is a grammar of graphics that lets you create thousands of graphs. R’s ggvis package implements the grammar, providing a system of data visualization for R.
Garrett Grolemund is a Data Scientist and Master Instructor at RStudio. Garrett maintains the lubridate R package and is the author of Hands-On Programming with R and the upcoming Data Science with R (both O’Reilly books).
Table of contents
- The dplyr Package 00:02:24
- Select Variables 00:08:29
- Filter Observations 00:10:03
- Derive Variables 00:06:03
- Summarize Observations 00:08:41
- Group Observations 00:17:37
- Re-Arrange Observations 00:05:08
- Case Study 1 - TB Counts 00:08:27
- Data Science for Data Wranglers, Part 2 - Units of Analysis 00:14:32
- Data Science for Data Wranglers, Part 3 - Tidy Data 00:10:46
- Reshape the Layout of Your Data 00:18:14
- Separate and Unite Variables 00:06:51
- Data Science for Data Wranglers, Part 4 - The Best Format 00:17:43
- Combine Data Sets 00:16:33
- Case Study 2 - TB Rates 00:09:08
- Data Science for Data Wranglers, Part 5: The Structure of Visualizations 00:05:53
- Visualize Observations 00:08:29
- Visualize Variables 00:17:04
- How to Learn More 00:09:35
- Title: Expert Data Wrangling with R
- Release date: February 2015
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491917046
You might also like
51+ hours of video instruction. Overview The professional programmer’s Deitel® video guide to Python development with …
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition
Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. …
Data Wrangling and Analysis with Python
Discover the data analysis capabilities of the Python Pandas software library in this introduction to data …
Applied Data Visualization with R and ggplot2
Create useful, elaborate, and visually-appealing plots using this open source package About This Video Discover structure …