Video description
Data analysts familiar with R will learn to leverage the power of Spark, distributed computing and cloud storage in this course that shows you how to use your R skills in a big data environment.
You'll learn to create Spark clusters on the Amazon Web Services (AWS) platform; perform cluster based data modeling using Gaussian generalized linear models, binomial generalized linear models, Naive Bayes, and K-means modeling; access data from S3 Spark DataFrames and other formats like CSV, Json, and HDFS; and do cluster based data manipulation operations with tools like SparkR and SparkSQL. By course end, you'll be capable of working with massive data sets not possible on a single computer. This hands-on class requires each learner to set-up their own extremely low-cost, easily terminated AWS account.
- Discover how to use your R skills in a big data distributed cloud computing cluster environment
- Gain hands-on experience setting up Spark clusters on Amazon's AWS cloud services platform
- Understand how to control a cloud instance on AWS using SSH or PuTTY
- Explore basic distributed modeling techniques like GLM, Naive Bayes, and K-means
- Learn to do cloud based data manipulation and processing using SparkR and SparkSQL
- Understand how to access data from the CSV, Json, HDFS, and S3 formats
Table of contents
-
Introduction
- Welcome to the Course 00:04:21
- About the Author 00:01:09
-
Creating Clusters on Amazon Web Services
- Creating an AWS Launching Instance 00:09:40
- Connecting to AWS Instance using SSH 00:06:19
- Connecting to AWS Instance using PuTTY 00:08:37
- Starting Spark Clusters Part 1 00:09:02
- Starting Spark Clusters Part 2 00:09:55
- Terminate Your Clusters 00:00:58
-
Data and Modeling Basics
- Data Basics 00:08:34
- Modeling with Gaussian Generalized Linear Models 00:11:19
- Modeling with Binomial Generalized Linear Models 00:09:34
- Naive Bayes and K-Means Modeling 00:09:14
-
Data Sources and Data Manipulation
- Bigger Data and S3 00:07:27
- Accessing S3 Spark Dataframes 00:04:57
- SparkR Dataframe Operations 00:11:01
- SparkSQL 00:05:16
-
Various
- Brief Look at HDFS 00:11:00
- Brief Look at Databricks Community Edition 00:08:20
-
Conclusion
- Wrap Up and Thank You 00:02:02
Product information
- Title: Using R for Big Data with Spark
- Author(s):
- Release date: October 2016
- Publisher(s): Infinite Skills
- ISBN: 9781491973028
You might also like
book
R for Data Science Cookbook
Over 100 hands-on recipes to effectively solve real-world data problems using the most popular R packages …
video
Scaled Forecasting with Python and R: With Forecasting for Several Different Types of Models and Time Series
In this video, you will explore forecasting techniques in Python, including how to use machine learning …
video
Hadoop and Spark Fundamentals
9+ Hours of Video Instruction The perfect (and fast) way to get started with Hadoop and …
video
Learning Path: R Programming for Data Analysts
15+ Hours of Video Instruction R Programming Data Analyst Learning Path, is a tour through the …