Calculating top for a large time frame

One common problem is to find the top contributors out of some huge set of unique values. For instance, if you want to know what IP addresses are using the most bandwidth in a given day or week, you may have to keep track of the total of request sizes across millions of unique hosts to definitively answer this question. When using summary indexes, this means storing millions of events in the summary index, quickly defeating the point of summary indexes.

Just to illustrate, let's look at a simple set of data:

Time	1.1.1.1	2.2.2.2	3.3.3.3	4.4.4.4	5.5.5.5	6.6.6.6
12:00	99	100	100	100
13:00	99		100	100	100
14:00	99	100		101	100
15:00	99		99	100	100
16:00	99	100			100	100

Get Implementing Splunk: Big Data Reporting and Development for Operational Intelligence now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Implementing Splunk: Big Data Reporting and Development for Operational Intelligence by Vincent Bumgarner

Calculating top for a large time frame

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly