Four short links: 23 January 2017
No Flakes in Production, In-Memory Database, MOOC Numbers, Pharma Rule of Thumb
- Putting the Science Back in Data Science (Daniel Whitenack) — it’s not worth putting a flaky implementation of an analysis into production.
- Scuba (PDF) — paper from Facebook about their millions-of-rows/second in-memory event database Scuba stores data completely in memory on hundreds of servers, each with 144 GB RAM. I’m boggling at that scale.
- 4 Years of MOOC Data — Harvard and MIT MOOC data analyzed. MOOCs educate thousands and certify hundreds. The median number of active participants in a course is 7,902; another 1,517 are considered “explorers” — those who explore half or more of the course content. […] Among those who stated up front that they intended to become certified, the median certification rate was 30%. Among those who paid for identity verification as part of the process of becoming certified, the median completion was 60%.
Predicting Medical AI in 2017 — interested me for this rule of thumb: the chance of an eventual clinical product and the time until that product is available will be:
Preclinical complete: 5% chance, 10 years
Phase I complete: 10% chance, 8 years
Phase II complete: 50% chance, 5 years
Phase III complete: 80% chance, 1 year.