Modern Systems Programming with Scala Native

Aggregation at Scale

In a higher-level idiom, without the constraints on scale we could sum up the counts by year easily with a map or dictionary. For example, in more verbose Scala we could do something like this:

	val data:Seq[NGramData] = ???
	val m = mutable.Map[String,Int]
	for (d <- data) {
	if (m.containsKey(d.word)) {
	m(d.word) = d.count
	} else {
	m(d.word) += d.count
	}
	}

Or in an even more functional style, admittedly at the cost of further efficiency, we could do this:

val g = l.groupBy(identity).map { i => (i._1,i._2.size) }

However, when scale is a concern, planning this sort of bulk data-processing job can be a genuinely hard problem, and Scala frameworks like Spark^[18] often do a great job of it. A common technique for ...

Get Modern Systems Programming with Scala Native now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Modern Systems Programming with Scala Native by Richard Whaling

Aggregation at Scale

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly