File handling with Hadoopy
Hadoopy is a library in Python, which provides an API to interact with Hadoop to manage files and perform MapReduce on it. Hadoopy can be downloaded from http://www.Hadoopy.com/en/latest/tutorial.html#installing-Hadoopy.
Let's try to put a few files in Hadoop through Hadoopy in a directory created within HDFS, called
$ Hadoop fs -mkdir data
Here is the code that puts the data into HDFS:
importHadoopy import os hdfs_path = '' def read_local_dir(local_path): for fn in os.listdir(local_path): path = os.path.join(local_path, fn) if os.path.isfile(path): yield path def main(): local_path = './BigData/dummy_data' for file in read_local_dir(local_path): Hadoopy.put(file, 'data') print"The file %s has been put ...