Web log analytics

Web logs is data generated by web servers running a website. This use case is applicable to domains where companies have their websites hosted and want to know more about their website performance and customer behavior on the website.

Getting ready

To perform this recipe, you should have an up and running Hadoop cluster. I have uploaded the data of some sample web logs from

https://github.com/deshpandetanmay/hadoop-real-world-cookbook/blob/master/data/mylog.txt.

How to do it...

Before jumping into the solution, let's first try to understand the problem statement:

Problem statement

Many companies run businesses on their websites. Their website performance decides the sales or profitability. Web servers generally log information about ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.