October 2017
Beginner to intermediate
572 pages
26h 1m
English
The package, rhdfs, provides functions so that users can manage HDFS using R. Similar to rmr2, you should install rhdfs on every task node, so that one can access HDFS through the R console.
To install rhdfs, you should first download rhdfs from GitHub. You can then install rhdfs in R by specifying where the HADOOP_CMD is located. You must configure R with Java support through the command, javareconf.
Next, you can access R and configure where HADOOP_CMD and HADOOP_STREAMING are located. Lastly, you can initialize rhdfs via the rhdfs.init function, which allows you to begin operating HDFS through rhdfs.
Read now
Unlock full access