By Nat Torkington
January 23, 2017
  1. Putting the Science Back in Data Science (Daniel Whitenack) — it’s not worth putting a flaky implementation of an analysis into production.
  2. Scuba (PDF) — paper from Facebook about their millions-of-rows/second in-memory event database Scuba stores data completely in memory on hundreds of servers, each with 144 GB RAM. I’m boggling at that scale.
  4. 4 Years of MOOC Data — Harvard and MIT MOOC data analyzed. MOOCs educate thousands and certify hundreds. The median number of active participants in a course is 7,902; another 1,517 are considered “explorers” — those who explore half or more of the course content. […] Among those who stated up front that they intended to become certified, the median certification rate was 30%. Among those who paid for identity verification as part of the process of becoming certified, the median completion was 60%.
  5. Predicting Medical AI in 2017 — interested me for this rule of thumb: the chance of an eventual clinical product and the time until that product is available will be:
    Preclinical complete: 5% chance, 10 years
    Phase I complete: 10% chance, 8 years
    Phase II complete: 50% chance, 5 years
    Phase III complete: 80% chance, 1 year
