R is a popular statistical programming language with wide range of extensions/packages that support data processing and machine learning tasks. However, data analysis in R is often limited by two main factors:
- Single threaded runtime environment: This often increasing the processing and makes your data analysis slow.
- Limitation of single machine’s memory: When accessing data stored in a data.frame or CSV file or any other format in R, the entire data must all fit in memory and this becomes a bottleneck when using a large dataset.
In this chapter, we will discuss how to overcome the two major pain points of R stated earlier by accessing data from datastore (MySql and MongoDB) or running analysis on distributed system with Apache ...