O'Reilly logo
live online training icon Live Online training

Next Steps in Data Analysis with R

Revealing the logic and strengths of R for working with data

Rick Scavetta

The big idea in this course is to imbue students with the confidence they need to apply and expand on their basic R knowledge. New R users typically learn from clean data sets and find it daunting to apply lessons from perfect case studies to completely new datasets they will encounter in the wild.

Participants should have at least worked in R before, but lack the confidence needed to approach completely new data analysis problems. The “First Steps in R” Live Online Training Course by the same author serves as an appropriate prerequisite.

What you'll learn-and how you can apply it

By the end of this live, hands-on, online course, you’ll understand:

  • How base R and tidyverse functions work together to create workflows
  • How to use basic functions in the tidyverse to process raw data for typical data analysis questions
  • The most common data structures in R
  • How to access internal elements by index position or queries
  • How they relate to each other
  • When they are most appropriate
  • How to query data using logical expressions, indexing and regular expressions
  • Common pitfalls with vectorization and indexing

And you’ll be able to:

  • Design an Exploratory Data Analysis (EDA) workflow from scratch
  • Search for and implement never-before-used functions
  • Debug common error messages

This training course is for you because...

  • You have seen a bit of R in action and want to start understanding how it works before delving deeper
  • You are currently learning R and would like compact, guided exercises to refresh and solidify your knowledge
  • You want to build confidence in approaching new data sets

Prerequisites

  • Basic knowledge of data analysis questions and scenarios, e.g. Given a dataset, what questions would you ask, as either the generator or recipient of the data.
  • Familiarity with basic R commands, i.e. base package and/or tidyverse.

Recommended preparation:

  • An RStudio account is needed for the in-course exercises. A cloud project will be provided with data sets and exercise scripts shortly before the course.

About your instructor

  • Rick Scavetta has worked as an independent data science trainer since 2012. Operating as Scavetta Academy, Rick has a close and recurring presence at primary research institutes all over Germany, including many Max Planck Institutes and Excellence Clusters, in fields as varied as primatology, earth sciences, marine biology, molecular genetics, and behavioral psychology.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Introduction (20 minutes)

  • Discussion: Review of core concepts of “Data Analysis”
  • Lecture: Descriptive statistics; Inferential statistics; Plotting; Query data according to specific criteria; Applying transformation and aggregation functions
  • Presentation: Case studies
  • Q&A

Mini-challenges (20 minutes)

  • Lecture: Hurdles to gaining confidence in R
  • Discussion and Exercise: Mini-challenge I - Importing difficult data structures
  • Discussion and Exercise: Mini-challenge II: Dealing with type mismatches
  • Q&A
  • 5 minute break

Case study: Part 1 (60 minutes)

  • Lecture: Introduction of new data set; Exploratory Data Analysis (EDA); Developing a strategy: From scratch to a reportable solution; Descriptive & inferential statistics, plotting; Transforming variables, extracting
  • Discussion: Analytical questions
  • Hands-on exercises: Apply analysis
  • Q&A
  • 5 minute break

Case study: Part 2 (60 minutes)

  • Lecture: Steps in completing our solution; Merging data frames; Working with lists and results of statistics; Reiteration; Indexing, logical expressions and transformation functions
  • Discussion: Strategies to completion and potential problems
  • Hands-on exercises: Apply solutions
  • Q&A

Course wrap up (10 minutes)

  • Discussion: Closing remarks
  • Q&A