We will build a use case using filters, group by, and aggregators. The use case finds the top 10 devices that generate the maximum data in a batch. Here is the pseudo code:
- Write a data generator that will publish an event with fields such as phone number, bytes in and bytes out
- The data generator will publish events in Kafka
- Write a topology program:
- To get the events from Kafka
- Apply filter to exclude phone number to take part in top 10
- Split event on the basis of comma
- Perform group by operation to bring same phone numbers together
- Perform aggregate and sum out bytes in and bytes out together
- Now, apply assembly with the FirstN function which requires the field name and number elements to be calculated
- And finally display ...