
142 Large Scale and Big Data
This overhead becomes more visible for the smallest skip offset of 20 MB. This
was expected since the Rabin ngerprint needs to be computed for a larger fraction
of the data. Somewhat more surprising was the reduction in throughput for the larg-
est skip offset of 60 MB. This is due to the fact that increasing the skip offset leads to
an increase in the average chunk size, which in turn leads to decreasing the amount
of parallelism toward the end of the data upload. We therefore found 40 MB to be a
reasonable compromise between these two negative factors.
4.6.5 work anD time sPeeDuP
We report the speedup of Incoop ...