# Business Analytics with R—Statistics and Machine Learning

Published byO'Reilly Media, Inc.

CreatedFebruary 2018

**What is this learning path about, and why is it important?**

For analyzing and graphing large amounts of statistical data, one programming language today stands out as a favorite among data analysts, engineers, and statisticians: R. And with the rapidly growing popularity of machine learning, R’s ability to work with large datasets and implement machine learning models has made knowing this language an even more indispensable skill to have in your personal toolset.

In this learning path designed for intermediate-level data analysts, scientists, engineers, and programmers who work with large datasets, you’ll learn how to effectively organize large amounts of data within the R platform, which is a key skill for any business that regularly deals with big data. Presented in three parts, in part 1, “Big Data With R and SQL: Databases and Data Manipulation,” you’ll look at how to work with SQL databases through the R platform and manipulate that data for the purposes of efficient analysis. In part 2, “Regression Analysis and Hypothesis Testing for Inference of Business Relationships in R,” you’re introduced to the key statistical techniques that are used to work with cross-sectional and time series datasets. You’ll also explore how to generate unique data-driven insights from your data. Finally, in part 3, “Machine Learning and R: Automation Methods for Business Analysis,” you’ll examine key machine learning techniques that you can use to automate certain aspects of data analysis and reveal key findings that traditional statistical programming typically miss. By the end of this learning path, you should be comfortable with handling large amounts of data within the R platform, as well as understand how to utilize large data banks to create comprehensive statistical models.

**What you’ll learn—and how you can apply it**

- Connect R to SQL databases and import data and commit queries
- Use data manipulation libraries in R (e.g., plyr, data.table) to efficiently organize large datasets
- Build linear models (Ordinary Least Squares and Logistic Regression) to quantify key business relationships
- Implement key machine learning models on your data, including
*k*-means, decision trees, random forests, and neural network models

**This learning path is for you because…**

- You're a data analyst, data scientist, or engineer who wants to learn key methods to effectively organize large and unstructured data
- You want to learn how to employ regression analysis to unearth important relationships between a variety of metrics
- You want to gain key insights into the use of machine learning to adapt to your data, and automate key elements of the data analysis process

**Prerequisites:**

- You should have a basic familiarity with key principles of regression analysis
- You should have experience in handling large datasets/working with databases will be helpful
- You do not need any prior knowledge of R

**Materials or downloads needed in advance:**

- Access to Rstudio/Jupyter Notebook (with R installed)
- All datasets used in the video examples will be provided for you to practice implementing the necessary techniques
- The code illustrated in this learning path (and accompanying results) can be included in a Jupyter Notebook, or other suitable platform