Using the HDFS Java API

The HDFS Java API can be used to interact with HDFS from any Java program. This API gives us the ability to utilize the data stored in HDFS from other Java programs as well as to process that data with other non-Hadoop computational frameworks. Occasionally, you may also come across a use case where you want to access HDFS directly from within a MapReduce application. However, if you are writing or modifying files in HDFS directly from a Map or Reduce task, please be aware that you are violating the side-effect-free nature of MapReduce, which might lead to data consistency issues based on your use case.

How to do it...

The following steps show you how to use the HDFS Java API to perform filesystem operations on an HDFS installation ...

Get Hadoop MapReduce v2 Cookbook - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.