File handling with Hadoopy
Hadoopy is a library in Python, which provides an API to interact with Hadoop to manage files and perform MapReduce on it. Hadoopy can be downloaded from http://www.Hadoopy.com/en/latest/tutorial.html#installing-Hadoopy.
Let's try to put a few files in Hadoop through Hadoopy in a directory created within HDFS, called data
:
$ Hadoop fs -mkdir data
Here is the code that puts the data into HDFS:
importHadoopy import os hdfs_path = '' def read_local_dir(local_path): for fn in os.listdir(local_path): path = os.path.join(local_path, fn) if os.path.isfile(path): yield path def main(): local_path = './BigData/dummy_data' for file in read_local_dir(local_path): Hadoopy.put(file, 'data') print"The file %s has been put ...
Get Mastering Python for Data Science now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.