Be smarter about your code
In a blog post that I penned showcasing the performance of this task under various optimization methods, I took it for granted that calculating the distances on the full dataset with the unparallelized/un-Rcpp-ed code would be a multi-hour affair—but I was seriously mistaken.
Shortly after publishing the post, a clever R programmer commented on it stating that they were able to slightly rework the code so that the serial/pure-R code took less than 20 seconds to complete with all the 13,429 observations. How? Vectorization.
single.core.improved <- function(airport.locs){ numrows <- nrow(airport.locs) running.sum <- 0 for (i in 1:(numrows-1)) { this.dist <- sum(haversine(airport.locs[i,2], airport.locs[i, 3], airport.locs[(i+1):numrows, ...
Get R: Data Analysis and Visualization now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.