In Hadoop, we deal with large data, so performing a simple copy operation might not be the optimal thing to do. Imagine copying a 1 TB file from one cluster to another, or within the same cluster to a different path, and after 50% of the copy operation it times out. In this situation, the copy has to be started from the beginning.
This recipe shows the steps needed to copy files within and across the cluster. Ensure that the user has a running cluster with YARN configured to run MapReduce, as discussed in Chapter 1, Hadoop Architecture and Deployment.
For this recipe, there is no configuration needed to run
Distcp; just make sure HDFS and YARN is up and running.