Video description
Overview
Alternative Backends for R LiveLessons teaches R programmers techniques for dealing with large data, both in memory and in databases.
Description
In this video training Jared starts with some common data manipulation operations using various base R functions and packages like plyr, comparing the speed of in memory calculations. He then demonstrates more advanced techniques for accomplishing the same task such as data.table, dplyr, Rcpp and parallel computation for increased speed. Finally, for when data size is an even bigger factor than speed he introduces external memory and database techniques using bibmemory, ff, SciDB, dplyr and Hadoop.
About the Instructor
Jared P. Lander is the Founder and CEO of Lander Analytics, the Organizer of the New York Open Statistical Programming Meetup and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. Jared oversees the long-term direction of the company and acts as Lead Data Scientist, researching the best strategy, models and algorithms for modern data needs. This is in addition to his client-facing consulting and training. He specializes in data management, multilevel models, machine learning, generalized linear models, data management, visualization and statistical computing. He is the author of R for Everyone, a book about R Programming geared toward Data Scientists and Non-Statisticians alike. The book is available from Amazon, Barnes & Noble, and InformIT. The material is drawn from the classes he teaches at Columbia and is incorporated into his corporate training. Very active in the data community, Jared is a frequent speaker at conferences, universities and meetups around the world. He is a member of the 2014 Strata New York selection committee.
Skill Level
- Intermediate
- Advanced
What You Will Learn
- Basic Aggregation
- plyr
- dplyr
- data.table
- Rcpp
- Parallel Processing
- Code Benchmarking
Who Should Take This Course
- R programmers who already have an intermediate level of knowledge such as that gained from Reading R for Everyone.
Course Requirements
- Basic Programming Skills
- Proficiency in R, including working with packages
Table of Contents
Lesson 1: Reading XML Data
1.1. Read HTML Table
1.2. Use xpath for complex searches in HTML
1.3. xmlToList for easier parsing
Lesson 2: Faster Group Operations
2.1. Aggregate normally
2.2. tapply
2.3. ddply
2.4. data.table
2.5. dplyr
2.6. ddply parallel
2.7. foreach
2.8. dplyr with a database
Lesson 3: Rcpp for faster code
3.1. Basics of C++ with R
3.2. Writing a C++ function for R
3.3. Using C++ code in an R package
Lesson 4: Advanced Machine Learning
4.1. Recommendation Engine with RecommenderLab
4.2. Text Mining with RTextTools
Lesson 5: Network Analysis
5.1. igraph
5.2. Reading edgelists
5.3. Base plots
5.4. tkplots
5.5. rglplots
5.6. Network metrics like diameter, shortest path
5.7. Node metrics like centrality and betweenness
Lesson 6: Advanced Graphics
6.1. ggvis
6.2. rCharts
About LiveLessons Video Training
LiveLessons Video Training series publishes hundreds of hands-on, expert-led video tutorials covering a wide selection of technology topics designed to teach you the skills you need to succeed. This professional and personal technology video series features world-leading author instructors published by your trusted technology brands: Addison-Wesley, Cisco Press, IBM Press, Pearson IT Certification, Prentice Hall, Sams, and Que. Topics include: IT Certification, Programming, Web Development, Mobile Development, Home and Office Technologies, Business and Management, and more. View all LiveLessons on InformIT at: http://www.informit.com/livelessons.
Table of contents
- Introduction
-
Lesson 1: Reading XML Data
- Learning Objectives 00:00:26
- 1.1 Read an HTML table 00:03:22
- 1.2 Use XPath for complex searches in HTML 00:18:50
- 1.3 Use xmlToList for easier parsing 00:05:31
-
Lesson 2: Faster Group Operations
- Learning Objectives 00:00:26
- 2.1 Aggregate using formula notation with the aggregate function and tapply 00:04:29
- 2.2 Use ddply for convenient aggregation 00:02:58
- 2.3 Process in parallel with ddply’s parallel option 00:03:31
- 2.4 Use data.table for faster aggregation 00:02:41
- 2.5 Use dplyr for convenient and fast aggregation 00:04:46
- 2.6 Operate in a database using dplyr 00:04:51
- Lesson 3: Rcpp for Faster Code
- Lesson 4: Advanced Machine Learning
- Lesson 5: Network Analysis
- Lesson 6: Web Graphics
- Lesson 7: Easier Presentations and Documents with RMarkdown
- Summary
Product information
- Title: Advanced R Programming
- Author(s):
- Release date: December 2015
- Publisher(s): Pearson
- ISBN: 0134052706
You might also like
book
Efficient R Programming
There are many excellent R resources for visualization, data science, and package development. Hundreds of scattered …
video
Learning To Program With R
In this Learning R training course, expert author Stuart Greenlee will teach you how to use …
book
Advanced R
An Essential Reference for Intermediate and Advanced R Programmers Advanced R presents useful tools and techniques …
video
Open Source Software Superstream Series: C++
Known for its speed and multithreading support—and its more than four decades of powerful application development—C++ …