In the previous chapter, we had a running Hadoop cluster with one Master and three Worker nodes on top of OpenStack. Be aware that running any job type in Sahara requires an Active state of the provisioned cluster.
Since we intend to use Swift for input and output data, the first example will illustrate how to neaten a simple text file by trimming and removing space in each line. The text file looks like the following:
OpenStack EDP Sahara Swift Jobs
To do so, we will execute a Pig Job in the Sahara cluster and designate the location of the text file in Swift named
input. The Pig script might look like the following:
I = load '$INPUT' using PigStorage(':') as (cloud: chararray); O = foreach I generate ...