The data is now in the cluster in HDFS. We'll now transform it using a SQL script. The program we're using is Hive. This program interacts with the data using SQL statements.
With most Hadoop programs (Hive, Pig, Sparks, and so on), source is read-only. It means that we cannot modify the data in the file that we transferred in the previous recipe. Some languages such as HBase allow us to modify the source data though. But for our purpose, we'll use Hive, a well-known program in the Hadoop ecosystem.