In our earlier scenario, we have multiple machine generated web log files. Although as we have seen that the web log files are too large to deal with MS Excel, they individually do not meet the criteria of big data. However, continuing the scenario, let's suppose we now have more than the original files as our website is perhaps generating multiple files each day. Given this presumption, we need a secure repository in which to store and then (hopefully) easily access our files.
Defining the environment
As I've mentioned, AWS provides us the ability to leverage Hadoop technology without spending all the time required to create and manage a new environment.
To use this environment, you need to first have an AWS account. Since this chapter ...