O'Reilly logo

Data Analysis with R - Second Edition by Tony Fischetti

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Being smarter about your code

In a blog post that I penned showcasing the performance of this task under various optimization methods, I took it for granted that calculating the distances on the full dataset with the unparallelized/un-Rcpp-ed code would be a multi-hour affair, but I was seriously mistaken.

Shortly after publishing the post, a clever R programmer commented on it stating that they were able to slightly rework the code so that the serial/pure-R code took less than 20 seconds to complete, with all the 13,429 observations. How? Vectorization. The following code illustrates the technique:

single.core.improved <- function(airport.locs){ 
  numrows <- nrow(airport.locs) 
  running.sum <- 0 
 for (i in 1:(numrows-1)) { this.dist <- sum(haversine(airport.locs[i,2], ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required