Time for action – getting web server data into Hadoop
Let's take a look at how we can simplistically copy data from a web server onto HDFS.
- Retrieve the text of the NameNode web interface to a local file:
$ curl localhost:50070 > web.txt
- Check the file size:
$ ls -ldh web.txt
You will receive the following response:
-rw-r--r-- 1 hadoop hadoop 246 Aug 19 08:53 web.txt
- Copy the file to HDFS:
$ hadoop fs -put web.txt web.txt
- Check the file on HDFS:
$ hadoop fs -ls
You will receive the following response:
Found 1 items -rw-r--r-- 1 hadoop supergroup 246 2012-08-19 08:53 /user/hadoop/web.txt
What just happened?
There
shouldn't be anything that is surprising here. We use the curl
utility to retrieve a web page from the embedded web server hosting ...
Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.