Apache Commons Math

Once we read the data, we can calculate the statistics. As we already mentioned earlier, we are typically interested in summaries such as min, max, mean, standard deviation, and so on. We can use the Apache Commons Math library for that. Let's include it in pom.xml:

<dependency>  <groupId>org.apache.commons</groupId>   <artifactId>commons-math3</artifactId>   <version>3.6.1</version> </dependency>

There is a SummaryStatistics class for calculating the summaries. Let's use it to calculate some statistics about the distribution of body content length of the pages we crawled:

SummaryStatistics statistics = new SummaryStatistics(); data.stream().mapToDouble(RankedPage::getBodyContentLength) .forEach(statistics::addValue); ...

Get Java: Data Science Made Easy now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.