January 2019
Intermediate to advanced
390 pages
9h 16m
English
hdfs3 is a lightweight Python wrapper around the C/C++ libhdfs3 library. It allows us to use HDFS natively from Python. To start, we first need to connect with the HDFS NameNode; this is done using the HDFileSystem class:
from hdfs3 import HDFileSystemhdfs = HDFileSystem(host = 'localhost', port=8020)
This automatically establishes a connection with the NameNode. Now, we can access a directory listing using the following:
print(hdfs.ls('/tmp'))
This will list all the files and directories in the tmp folder. You can use functions such as mkdir to make a directory and cp to copy a file from one location to another. To write into a file, we open it first using the open method and use write:
with hdfs.open('/tmp/file1.txt','wb') ...Read now
Unlock full access