RHadoop for using Hadoop from R
RHadoop is a collection of open source packages using which an R user can manage and analyze data stored in the
Hadoop Distributed File System (HDFS). In the background, RHadoop will translate these as MapReduce operations in Java and run them on HDFS.
The various packages in RHadoop and their uses are as follows:
- rhdfs: Using this package, a user can connect to an HDFS from R and perform basic actions such as read, write, and modify files.
- rhbase: This is the package to connect to a HBASE database from R and to read, write, and modify tables.
- plyrmr: Using this package, an R user can do the common data manipulation tasks such as the slicing and dicing of datasets. This is similar to the function of packages such ...