March 2018
Beginner to intermediate
570 pages
13h 42m
English
In a blog post that I penned showcasing the performance of this task under various optimization methods, I took it for granted that calculating the distances on the full dataset with the unparallelized/un-Rcpp-ed code would be a multi-hour affair, but I was seriously mistaken.
Shortly after publishing the post, a clever R programmer commented on it stating that they were able to slightly rework the code so that the serial/pure-R code took less than 20 seconds to complete, with all the 13,429 observations. How? Vectorization. The following code illustrates the technique:
single.core.improved <- function(airport.locs){
numrows <- nrow(airport.locs)
running.sum <- 0
for (i in 1:(numrows-1)) { this.dist <- sum(haversine(airport.locs[i,2], ...