book

Big Data Analytics with R

Name: Big Data Analytics with R
Author: Simon Walkowiak
ISBN: 9781786466457

by Simon Walkowiak

July 2016

Beginner to intermediate

506 pages

11h 23m

English

Packt Publishing

Read now

Unlock full access

Big Data Analytics with R
Big Data Analytics with R
Credits
About the Author
Acknowledgement
About the Reviewers
www.PacktPub.com
eBooks, discount offers, and moreWhy subscribe?
Preface
What this book covers
What you need for this book
Who this book is for

Conventions
Reader feedback
Customer support
Downloading the example codeErrataPiracyQuestions
1. The Era of Big Data
Big Data – The monster re-defined
Big Data toolbox - dealing with the giant
Hadoop - the elephant in the roomDatabasesHadoop Spark-ed up
R – The unsung Big Data hero
Summary
2. Introduction to R Programming Language and Statistical Environment
Learning R
Revisiting R basics
Getting R and RStudio readySetting the URLs to R repositoriesR data structuresVectorsScalarsMatricesArraysData framesListsExporting R data objects
Applied data science with R
Importing data from different formatsExploratory Data AnalysisData aggregations and contingency tablesHypothesis testing and statistical inferenceTests of differencesIndependent t-test example (with power and effect size estimates)ANOVA exampleTests of relationshipsAn example of Pearson's r correlationsMultiple regression exampleData visualization packages
Summary
3. Unleashing the Power of R from Within
Traditional limitations of ROut-of-memory dataProcessing speed
To the memory limits and beyond
Data transformations and aggregations with the ff and ffbase packagesGeneralized linear models with the ff and ffbase packagesLogistic regression example with ffbase and biglmExpanding memory with the bigmemory package
Parallel R
From bigmemory to faster computationsAn apply() example with the big.matrix objectA for() loop example with the ffdf objectUsing apply() and for() loop examples on a data.frameA parallel package exampleA foreach package exampleThe future of parallel processing in RUtilizing Graphics Processing Units with RMulti-threading with Microsoft R Open distributionParallel machine learning with H2O and R
Boosting R performance with the data.table package and other tools
Fast data import and manipulation with the data.table packageData import with data.tableLightning-fast subsets and aggregations on data.tableChaining, more complex aggregations, and pivot tables with data.tableWriting better R code
Summary
4. Hadoop and MapReduce Framework for R
Hadoop architectureHadoop Distributed File SystemMapReduce frameworkA simple MapReduce word count exampleOther Hadoop native toolsLearning Hadoop
A single-node Hadoop in Cloud
Deploying Hortonworks Sandbox on AzureA word count example in Hadoop using JavaA word count example in Hadoop using the R languageRStudio Server on a Linux RedHat/CentOS virtual machineInstalling and configuring RHadoop packagesHDFS management and MapReduce in R - a word count example
HDInsight - a multi-node Hadoop cluster on Azure
Creating your first HDInsight clusterCreating a new Resource GroupDeploying a Virtual NetworkCreating a Network Security GroupSetting up and configuring an HDInsight clusterStarting the cluster and exploring AmbariConnecting to the HDInsight cluster and installing RStudio ServerAdding a new inbound security rule for port 8787Editing the Virtual Network's public IP address for the head nodeSmart energy meter readings analysis example – using R on HDInsight cluster
Summary
5. R with Relational Database Management Systems (RDBMSs)
Relational Database Management Systems (RDBMSs)A short overview of used RDBMSsStructured Query Language (SQL)
SQLite with R
Preparing and importing data into a local SQLite databaseConnecting to SQLite from RStudio
MariaDB with R on a Amazon EC2 instance
Preparing the EC2 instance and RStudio Server for usePreparing MariaDB and data for useWorking with MariaDB from RStudio
PostgreSQL with R on Amazon RDS
Launching an Amazon RDS database instancePreparing and uploading data to Amazon RDSRemotely querying PostgreSQL on Amazon RDS from RStudio
Summary
6. R with Non-Relational (NoSQL) Databases
Introduction to NoSQL databasesReview of leading non-relational databases
MongoDB with R
Introduction to MongoDBMongoDB data modelsInstalling MongoDB with R on Amazon EC2Processing Big Data using MongoDB with RImporting data into MongoDB and basic MongoDB commandsMongoDB with R using the rmongodb packageMongoDB with R using the RMongo packageMongoDB with R using the mongolite package
HBase with R
Azure HDInsight with HBase and RStudio ServerImporting the data to HDFS and HBaseReading and querying HBase using the rhbase package
Summary
7. Faster than Hadoop - Spark with R
Spark for Big Data analytics
Spark with R on a multi-node HDInsight cluster
Launching HDInsight with Spark and R/RStudioReading the data into HDFS and HiveGetting the data into HDFSImporting data from HDFS to HiveBay Area Bike Share analysis using SparkR
Summary
8. Machine Learning Methods for Big Data in R
What is machine learning?Machine learning algorithmsSupervised and unsupervised machine learning methodsClassification and clustering algorithmsMachine learning methods with RBig Data machine learning tools
GLM example with Spark and R on the HDInsight cluster
Preparing the Spark cluster and reading the data from HDFSLogistic regression in Spark with R
Naive Bayes with H2O on Hadoop with R
Running an H2O instance on Hadoop with RReading and exploring the data in H2ONaive Bayes on H2O with R
Neural Networks with H2O on Hadoop with R
How do Neural Networks work?Running Deep Learning models on H2O
Summary
9. The Future of R - Big, Fast, and Smart Data
The current state of Big Data analytics with ROut-of-memory data on a single machineFaster data processing with RHadoop with RSpark with RR with databasesMachine learning with R
The future of R
Big DataFast dataSmart data
Where to go next
Summary

Overview

Unlock the potential of big data analytics by mastering R programming with this comprehensive guide. This book takes you step-by-step through real-world scenarios where R's capabilities shine, providing you with practical skills to handle, process, and analyze large and complex datasets effectively.

What this Book will help me do

Understand the latest big data processing methods and how R can enhance their application.
Set up and use big data platforms such as Hadoop and Spark in conjunction with R.
Utilize R for practical big data problems, such as analyzing consumption and behavioral datasets.
Integrate R with SQL and NoSQL databases to maximize its versatility in data management.
Discover advanced machine learning implementations using R and Spark MLlib for predictive analytics.

Author(s)

None Walkowiak is an experienced data analyst and R programming expert with a passion for data engineering and machine learning. With a deep knowledge of big data platforms and extensive teaching experience, they bring a clear and approachable writing style to help learners excel.

Who is it for?

Ideal for data analysts, scientists, and engineers with fundamental data analysis knowledge looking to enhance their big data capabilities using R. If you aim to adapt R for large-scale data management and analysis workflows, this book is your ideal companion to bridge the gap.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Big Data Analytics

Venkat Ankam, Aravind Nallan

Practical Big Data Analytics

Nataraj Dasgupta

R: Data Analysis and Visualization

Tony Fischetti, Brett Lantz, Jaynal Abedin, Hrishi V. Mittal, Bater Makhabel, Edina Berlinger, Ferenc Illés, Milán Badics, Ádám Banai, Gergely Daróczi, Barbara Dömötör, Gergely Gabler, Dániel Havran, Péter Juhász, István Margitai, Balázs Márkus, Péter Medvegyev, Julia Molnár, Balázs Árpád Szucs, Ágnes Tuza, Tamás Vadász, Kata Váradi, Ágnes Vidovics-Dancs

Regression Analysis with R

Giuseppe Ciaburro, Pierre Paquay, Manoj Kumar, Shaikh Salamatullah

Publisher Resources

ISBN: 9781786466457

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills