Four short links: 19 March 2018
ML Reproducibility, Data Research, Blockchain IP Crimes, and PLATO Flashback
- The Machine Learning Reproducibility Crisis (Pete Warden) — Even the original author sometimes couldn’t train the same model and get similar results! He was hoping that I had a solution I could recommend, but I had to admit that I struggle with the same problems in my own work.
- Facebook and Cambridge Analytica (Guardian) — horrifying. We exploited Facebook to harvest millions of people’s profiles. And built models to exploit what we knew about them and target their inner demons. That was the basis the entire company was built on. Short story: researcher makes quiz at a time when Facebook’s API gave out A LOT of data, then uses that data for purposes beyond T&Cs. Takeaway: if you manage personal data, a burden upon you is to work closely with researchers who want to access it so that you can verify their legitimacy and compliance with T&Cs.
- Illegal Data on the Blockchain — Our analysis shows that certain content—e.g., illegal pornography—can render the mere possession of a blockchain illegal. Based on these insights, we conduct a thorough quantitative and qualitative analysis of unintended content on Bitcoin’s blockchain. Although most data originates from benign extensions to Bitcoin’s protocol, our analysis reveals more than 1,600 files on the blockchain, over 99% of which are texts or images. Among these files there is clearly objectionable content such as links to child pornography, which is distributed to all Bitcoin participants.
- A Look Back at PLATO (IEEE) — “Imagine if today, iOS or Linux had built-in libraries of code that allowed anyone to build a social application that didn’t require cutting a deal with Facebook or using their APIs,” Dear says. “[With PLATO], the API was in the operating system and it allowed any app to be social. That was kind of the assumption that the PLATO people had.” This was in the 60s and 70s.