Example 1

In our earlier scenario, we have multiple machine generated web log files. Although as we have seen that the web log files are too large to deal with MS Excel, they individually do not meet the criteria of big data. However, continuing the scenario, let's suppose we now have more than the original files as our website is perhaps generating multiple files each day. Given this presumption, we need a secure repository in which to store and then (hopefully) easily access our files.

Defining the environment

As I've mentioned, AWS provides us the ability to leverage Hadoop technology without spending all the time required to create and manage a new environment.

To use this environment, you need to first have an AWS account. Since this chapter ...

Get Big Data Visualization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.