July 2017
Beginner to intermediate
715 pages
17h 3m
English
Once we read the data, we can calculate the statistics. As we already mentioned earlier, we are typically interested in summaries such as min, max, mean, standard deviation, and so on. We can use the Apache Commons Math library for that. Let's include it in pom.xml:
<dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-math3</artifactId> <version>3.6.1</version> </dependency>
There is a SummaryStatistics class for calculating the summaries. Let's use it to calculate some statistics about the distribution of body content length of the pages we crawled:
SummaryStatistics statistics = new SummaryStatistics(); data.stream().mapToDouble(RankedPage::getBodyContentLength) .forEach(statistics::addValue); ...
Read now
Unlock full access