Summary
In this chapter, we provided the reader with a high-level definition of Hadoop, including some fun Hadoop FAQs. We mentioned that simply reaching MS Excel limitations doesn't mean that you are actually dealing with big data and used simple examples of R programming scripts to actually manipulate and visualize that same data that would not load in Excel to prove that point.
We then introduced the Amazon AWS environment as a simple, affordable, yet robust solution for leveraging the technology and power of Hadoop. We stepped through the process configuring that environment for our use, uploading our multiple web log files to it, and then used Hive and its query language (HiveQL) to access and manipulate that data to accomplish the same objectives ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access