groupBy operation doesn't involve any repartitioning. The
groupBy operation converts the input stream into a grouped stream. The main function of the
groupBy operation is to modify the behavior of the subsequent
aggregate function. The following diagram shows how the
groupBy operation groups the tuples of a single partition:
groupByoperation is used before the partition aggregate, then the partition aggregate will run the aggregate on each group created within the partition.
groupByoperation is used before the aggregate, then in that case, tuples of the same batch ...