Four Short Links

Nat Torkington's eclectic collection of curated links.

Four short links: 16 August 2018

Distributed Execution, Roaming SIM, Social Robot, and Bad Design

  1. Ray -- a flexible, high-performance distributed execution framework from OpenAI, targeting AI applications including reinforcement learning. (via "Notes from the first Ray meetup")
  2. KnowRoaming Global SIM Sticker -- Put your SIM card back in your phone. When you’re at home, the sticker. (via Engadget)
  3. Haru (IEEE Spectrum) -- inside Honda's new social robot.
  4. Botched CIA Communications System Helped Blow Agents' Cover (Foreign Policy) -- In the words of one of the former officials, the CIA had “fucked up the firewall” between the two systems. When bad systems architecture kills people...

Four short links: 15 August 2018

Retro Hacks, Timsort, e-ink UI, and Inside Time Zones

  1. TRS-80 Galaxy Invasion on an RC2014 -- I love these retro hacks. This uses a homebrew Z80 with a Raspberry Pi Zero (!) to do the video graphics, which is painful and burdensome otherwise.
  2. Timsort -- all you need to know about Python's sorting algorithm.
  3. PaperTTY -- Python module to render a TTY on e-ink.
  4. Working with Time Zones -- the graphs are such a brilliant way of explaining it!

Four short links: 14 August 2018

Hyrum's Law, Academic Torrents, Logic Textbook, and Suboptimal Fairness

  1. Hyrum's Law -- With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody. (via Simon Willison)
  2. Academic Torrents -- a community-maintained distributed repository for data sets and scientific knowledge. 27GB and growing.
  3. Open Logic Project -- a collection of teaching materials on mathematical logic aimed at a non-mathematical audience, intended for use in advanced logic courses as taught in many philosophy departments. It is open source: you can download the LaTeX code. It is open: you’re free to change it whichever way you like, and share your changes. It is collaborative: a team of people is working on it, using the GitHub platform, and we welcome contributions and feedback. And it is written with configurability in mind.
  4. Delayed Impact of Fair Machine Learning (Paper a Day) -- it’s therefore possible to have a fairness intervention with the unintended consequence of leaving the disadvantaged group worse off than they were before.

Four short links: 13 August 2018

Algorithms, Feedback, Transliteration, and Diagnosis

  1. Dijkstras in Disguise -- It turns out that many algorithms I've encountered in my computer graphics, finance, and reinforcement learning studies are all variations of this relaxation principle in disguise. [...] This blog post is a gentle tutorial on how all these varied CS topics are connected.
  2. The Blacker the Box -- The thesis of this post is: The faster the feedback on prediction accuracy, the blacker the box can be. The slower the feedback, the more your models should be explicit and formal.
  3. Design Challenges in Named Entity Transliteration -- In order to improve availability of bilingual named entity transliteration data sets, we release personal name bilingual dictionaries mined from Wikidata for English to Russian, Hebrew, Arabic, and Japanese Katakana. Our code and dictionaries are publicly available. GitHub.
  4. AI Spots Fibromyalgia -- A machine-learning algorithm that was programmed to recognize this neurological signature was able to use it to predict which brain scans were indicative of fibromyalgia and which were not. [...] López-Solà’s research is compelling evidence to convince those who are reluctant to accept the existence of fibromyalgia. Interesting because medical science argues whether the condition is real, but the software can reliably identify something.

Four short links: 10 August 2018

CS for DS, Operations Research, Coding Sandpit, and GitHub's Load Balancer

  1. The Best Books on Computer Science for Data Scientists (Hadley Wickham) -- a solid list.
  2. OR Tools -- an open source, fast, and portable software suite for solving combinatorial optimization problems.
  3. Pencil Code -- a collaborative programming site for drawing art, playing music, and creating games. It is also a place to experiment with mathematical functions, geometry, graphing, webpages, simulations, and algorithms. Programs are open for all to see and copy. In my head as a good "what next after Scratch?"
  4. GLB Director: GitHub Load Balancer -- a Layer 4 load balancer that scales a single IP address across a large number of physical machines while attempting to minimize connection disruption during any change in servers.

Four short links: 9 August 2018

Music Money, Faster Webpages, Sampling Neurons, and Catching Deepfakes

  1. How Musicians Make Money (Or Don’t at All) in 2018 (Rolling Stone) -- When you end up tracing all the dollars, around 10% of it gets captured by the artist. That’s amazingly low.
  2. Why AMP? -- an interesting answer from Hacker News to this question: AMP doesn't support a lot of the crap that makes webpages slow, so it's a way to say "computer says no" to feature requests that would slow page load time. If you can find a better way to convince large organizations that page load speed is a valuable metric, and more important than whatever other resource they want to load today, I'd love to hear it. But from what I've seen, AMP is the only thing that's had any success in tackling this problem.
  3. Neuropixels -- most electrode arrays are built in academic foundries and house 64 sensors in a 1,050-square-micron device. Neuropixels were designed and manufactured in a foundry called Imec, owned by the Flemish government. The Imec probe packs nearly 1,000 recording sites onto a single shank about 1,400 microns square and 10 millimeters long, which spans the full depth of a rat brain.
  4. DARPA's First Tools For Catching Deepfakes -- via the Media Forensics program. Others involved in the DARPA challenge are exploring similar tricks for automatically catching deepfakes: strange head movements, odd eye color, and so on. “We are working on exploiting these types of physiological signals that, for now at least, are difficult for deepfakes to mimic,” says Hany Farid, a leading digital forensics expert at Dartmouth University. I do hope their "what is not fake" data set includes non-neurotypical people.

Four short links: 8 August 2018

AI Patenting, Data Viz Errors, Developer Tool, and Origami-hand

  1. AI Patenting Up -- Facebook filed for 55 patents related to machine learning or neural networks in 2016, up from zero in 2010. IBM, which has been granted more U.S. patents than any other company for the past 25 years running, boasts that in 2017 it won 1,400 AI-related patents, more than ever before.
  2. Data Visualization Don'ts -- so good.
  3. Luna Studio -- a developer’s whiteboard on steroids. Design, prototype, develop, and refactor any application simply by connecting visual elements together. Collaborate with co-workers, interactively fine tune parameters, inspect the results, and visually profile the performance in real time. I'm very interested in IDE and tool improvements; they're force multipliers for programmers.
  4. Origami-hand -- Origami-hand is a disposable robot hand that folds and assembles paper. We aim to expand the application range of robots by realizing hands that can perform complicated operations at low cost. (via IEEE Spectrum)

Four short links: 7 August 2018

Azure Kubernetes, Systems Neuroscience, Facebook's TLS, and Speech Benchmark

  1. Horrors of using Azure Kubernetes Service in Production -- a cautionary tale.
  2. Systems Neuroscience is About to Get Bonkers -- Two advances are making the impending data surge an important issue to address now. The first is the advent of widely available, low-cost recording technologies that are easy to use, such as the Neuropixels probes. [...] The second is the maturation of deep learning, a catchphrase for a collection of very powerful artificial neural network concepts, along with the software and hardware that power them.
  3. Fizz -- Facebook's TLS 1.3 implementation, open-sourced. (via Facebook blog post)
  4. stt-benchmark -- a minimalist and extensible framework for benchmarking different speech-to-text engines.

Four short links: 6 August 2018

GPU Notebooks, Reproducible Experiments, History of Fears, and Debloating Code

  1. Kernels -- Kaggle hosting Jupyter Notebooks with GPUs.
  2. Speedrun -- A no-strings-attached toolkit to help you deploy and manage your machine learning experiments. The idea is to equip you with the tools you need to have well-documented and reproducible experiments going, but without getting in your way.
  3. Ancient Dreams of Intelligent Machines (Nature) -- the extraordinary history of cultural responses to automata.
  4. Trimmer -- an application specialization tool that leverages user-provided configuration data to specialize an application to its deployment context. The specialization process attempts to eliminate the application functionality that is unused in the user-defined context. Our evaluation demonstrates Trimmer can effectively reduce code bloat. For 13 applications spanning various domains, we observe a mean binary size reduction of 21% and a maximum reduction of 75%. Description of how it works, but no source, alas.

Four short links: 3 August 2018

Text to Faces, Cartoon Camera, Change Tracking, and Quantum Mechanics Toys

  1. text2face -- turns a textual description of a face into a photo. Creates rather ghostly images now, but you can easily see this as the start of an automated identikit.
  2. Draw This -- an instant camera that draws cartoons. A whimsical deep learning application.
  3. Orbit -- a composable framework for orchestrating change processing, tracking, and synchronization across multiple data sources.
  4. These Quantum Mechanics Toys Didn't Catch On (IEEE) -- But the quantum toys proved a pedagogical nonstarter. Papaliolios never published an instruction manual for them or a paper describing their potential value in the classroom. When he lent them to colleagues or students, they were met with confusion or indifference. I'm imagining a CS Unplugged for quantum mechanics ...

Four short links: 2 August 2018

Problem Recognition, Evolving Floorplans, Google in China, and The Bullshit Web

  1. Clues and Signals -- a product manager's pattern library for when things are going to go wrong. So on-point it burns. Samples: Might as well [do some extra thing] while we [do the original thing]. It's too early to [some interaction with users/customers]. We need some quick wins because [normal wins take too long].
  2. Evolving Floorplans -- The rooms and expected flow of people are given to a genetic algorithm which attempts to optimize the layout to minimize walking time, the use of hallways, etc.
  3. Google Returning to China, with Censorship (The Intercept) -- “I’m against large companies and governments collaborating in the oppression of their people, and feel like transparency around what’s being done is in the public interest,” the source said, adding that they feared “what is done in China will become a template for many other nations.” Nicely said. “Organize the world’s information and make it universally accessible and useful," with a few key caveats.
  4. The Bullshit Web -- An honest web is one in which the overwhelming majority of the code and assets downloaded to a user’s computer are used in a page’s visual presentation, with nearly all the remainder used to define the semantic structure and associated metadata on the page. Bullshit — in the form of CPU-sucking surveillance, unnecessarily-interruptive elements, and behaviours that nobody responsible for a website would themselves find appealing as a visitor — is unwelcome and intolerable.

Four short links: 1 August 2018

Data Science Ethics, Bandit Algorithms, Formal Methods, and FAST Goals

  1. Data's Day of Reckoning (Loukides, Mason, Patil) -- Data science, machine learning, artificial intelligence, and related technologies are now facing a day of reckoning. It is time for us to take responsibility for our creations. What does it mean to take responsibility for building, maintaining, and managing data, technologies, and services?
  2. Bandit Algorithms (Tor Lattimore) -- A practitioner seeking to apply a bandit algorithm must understand which assumptions in the theory are important and how to modify the algorithm when the assumptions change. We hope this book can provide that understanding. Bandit algorithms make decisions with partial information, taking into account the cost of getting more information.
  3. Augmenting Agile with Formal Methods -- The difference between writing TLA+ and just writing unit tests isn’t half an hour versus sixteen hours, it’s half an hour versus “Two weeks to realize there’s a bug, a week to find the bug, three days to understand the bug, sixteen hours to write the test, twenty minutes to run the test, and you don’t know if your fix really works.”
  4. FAST Goals Beat SMART Goals (MIT Sloan Review) -- FAST = Frequently-discussed, Ambitious, Specific, and Transparent. (via Helen Bevan)

Four short links: 31 July 2018

Quantum Circuits, Site Reliability, AI and Board Games, and Designing for Use

  1. Quirk -- a drag-and-drop quantum circuit simulator. (via Hacker News)
  2. Site Reliability Workbook -- how to implement SRE at your org, available online for free until August 23 or always from Amazon.
  3. Blood Bowl: The Next Board Game Challenge for AI -- At first sight, the game ought to be approachable by numerous game-playing algorithms. However, as all pieces on the board belonging to a player can be moved several times each turn, the turn-wise branching factor becomes overwhelming for traditional algorithms. Additionally, scoring points in the game is rare and difficult, which makes it hard to design heuristics for search algorithms or apply reinforcement learning.
  4. How We Redesigned Dropbox for Rapid Mobile Work -- You’re no longer designing to extend engagement or keep customers in the app. Rather, you’re helping people get in and out and done. Please, more design for this goal.

Four short links: 30 July 2018

Lightbulb Moments, Crowd Simulations, Secrets and Accountability, and Whole Genome Sequencing

  1. Phoebus Cartel (Wikipedia) -- a fascinating piece of history I never knew about: lightbulb manufacturers banding together to shorten lifetimes and raise profits. Reminds me of Adam Smith's great line: "People of the same trade seldom meet together, even for merriment and diversion, but the conversation ends in a conspiracy against the public, or in some contrivance to raise prices."
  2. The Wisdom And/Or Madness of Crowds -- nifty little interactive that gives you a deeper understanding of network theory. I am a huge believer in simulations as teaching mechanism.
  3. Practical Accountability of Secret Processes -- aimed at the court system we illustrate how accountability and secrecy are simultaneously achievable when modern cryptography is brought to bear. Our system improves configurability while preserving secrecy, offering new tradeoffs potentially more palatable to the risk-averse court system.
  4. Dante Labs Whole Genome Sequencing -- get every base pair in your DNA sequenced for USD500, and own the data (unlike 23andme and similar services, which only test some known single-basepair markers, and then own and resell your data). Making sense of the data left as an exercise to the reader.

Four short links: 27 July 2018

Security Keys, Speech Recognition, Kubernetes Security, and Strategic Competition

  1. Security Keys Neutralized Employee Phishing -- Google has not had any of its 85,000+ employees successfully phished on their work-related accounts since early 2017, when it began requiring all employees to use physical Security Keys in place of passwords and one-time codes.
  2. The Accent Gap (WaPo) -- We tested Amazon's Alexa and Google's Home to see how people with accents are getting left behind in the smart-speaker revolution. Ok, so first up: everyone has an accent, it's just that software has been preferentially trained on some of them. But also: I know teachers who use Android's voice recognition to help students acquire a mainstream accent, a useful skill for them to have. This isn't all bad.
  3. Kubernetes Security Best Practices -- my goal in this article is to cover some common security mistakes I have observed and offer some general best practices around securing Kubernetes clusters and workloads.
  4. Strategic Competition in an Age of AI -- Software often diffuses much more easily than hardware, both because of the commercial incentives that can drive software creation and because the talent pool necessary to create new software can exist even in countries that are not generally major military producers, such as advanced economies in Asia. The key elements of national power in AI are therefore related to the question of whether it makes sense to think about AI as software or hardware.

Four short links: 26 July 2018

Events Defense, Reading List, Go Cloud, and Operating Systems Book

  1. When The Nazis Show Up -- an organizer's perspective on what happens at a conference when the white supremacists show up. A lesson for all of us event organizers. As he says, [U]sing your rules and norms against you is an alt-right go-to. [...] And, honestly, most conferences don't model for these kinds of threats. They model for "drunk dude groping the presenter" and "racist greybeard drops n-bombs."
  2. YC's Summer Reading List -- my favorite session to host at unconferences is "What Are You Reading?", and here's one from the YC folks. Refreshingly short on the "meditate yourself rich with keto mindfulness training!" business/self-help schlock.
  3. Go Cloud -- the promise is to write vendor-neutral cloud apps in Go. We have identified common services used by cloud applications and have created generic APIs to work across cloud providers. Today, Go Cloud is launching with blob storage, MySQL database access, runtime configuration, and an HTTP server configured with request logging, tracing, and health checking. Go Cloud offers support for Google Cloud Platform (GCP) and Amazon Web Services (AWS). We plan to work with cloud industry partners and the Go community to add support for additional cloud providers very soon.
  4. Operating Systems: Three Easy Pieces -- a free online operating systems book! The book is centered around three conceptual pieces that are fundamental to operating systems: virtualization, concurrency, and persistence. In understanding the conceptual, you will also learn the practical, including how an operating system does things like schedule the CPU, manage memory, and store files persistently. Lots of fun stuff!

Four short links: 25 July 2018

Quantum Computing, A/B Tests, Rockstar Programming Language, and Git Solutions

  1. Strawberry Fields -- a full-stack Python library for designing, simulating, and optimizing continuous variable (CV) quantum optical circuits. (via Hacker News)
  2. p-Hacking and False Discovery in A/B Tests -- Experimenters indeed p-hack, especially for positive effects. Specifically, about 57% of experimenters p-hack when the experiment reaches 90% confidence. Furthermore, approximately 70% of the effects are truly null, and p-hacking increases the false discovery rate (FDR) from 33% to 42% among experiments p-hacked at 90% confidence. Assuming that false discoveries cause experimenters to stop exploring for more effective treatments, we estimate the expected cost of a false discovery to be a loss of 1.95% in lift, which corresponds to the 76th percentile of observed lifts. But it feels good to optimize your product with data, and that's what counts.
  3. Rockstar -- a dynamically typed Turing-complete programming language. Rockstar is designed for creating computer programs that are also song lyrics, and is heavily influenced by the lyrical conventions of 1980s hard rock and power ballads.
  4. 10 Common Git Problems and How to Fix Them -- for every git newcomer.

Four short links: 24 July 2018

Data Transfer, Quantum Computing, Optimal Control Theory, and Observability

  1. Data Transfer Project -- Facebook, Google, Microsoft, and Twitter collaborating on a data interchange project. Data Transfer Project (DTP) is a collaboration of organizations committed to building a common framework with open source code that can connect any two online service providers, enabling a seamless, direct, user-initiated portability of data between the two platforms.
  2. Getting Started with Quantum Computing in Python -- In this tutorial, we’ll go through how you can program a simple quantum computer to generate random numbers. (via Hacker News)
  3. Introduction to Mathematical Optimal Control Theory -- lecture notes. In the words of one HN commenter, machine learning and OCT are attempting to solve the same problem: choose the optimal action to take at the current time for a given process. Control theorists normally start out with a model, or a family of potential models that describe the behavior of the process and work from there to determine the optimal action. This is very much an area of applied mathematics, and academics take rigorous approaches, but, in industry, many engineers just use a PID or LQR controller and call it a day, regardless how applicable they are to the actual system theoretically. Meanwhile, the reinforcement learning folk typically work on problems where the models are too complicated to work with computationally or often even to write down, so a more tractable approach is to learn a model and control policy from data.
  4. Veneur -- Stripe's distributed, fault-tolerant pipeline for observability data.

Four short links: 23 July 2018

State Sponsored Trolling, Public Standards, Explorable Explanations, and iOS Network Debugging

  1. State Sponsored Trolling (Institute For The Future) -- authoritarians around the world have mastered social media. Bloomberg did some great follow-up work on the IFTF report. (via Cory Doctorow)
  2. Public Resource Wins Right to Publish Standards Used in Law -- The question in this case is whether private organizations whose standards have been incorporated by reference can invoke copyright and trademark law to prevent the unauthorized copying and distribution of their works. [...] Because the district court erred in its application of both fair use doctrines, we reverse and remand, leaving for another day the far thornier question of whether standards retain their copyright after they are incorporated by reference into law.
  3. Explorable Explanations -- explanations and simulators for things to help you learn them. Regular readers will know I'm a huge fan of simulations as learning tools.
  4. Wormholy -- debug network iOS apps from within the app: Add it to your project, and that's all! Shake your device or your simulator and Wormholy will appear. In case, for whatever reason, the Charles proxy doesn't do it for you.

Four short links: 20 July 2018

Convolutional Architectures, GPU Language, Acoustic Scenes, and Cybersecurity Numbers

  1. DARTS: Differentiable Architecture Search -- our algorithm excels in discovering high-performance convolutional architectures for image classification and recurrent architectures for language modeling, while being orders of magnitude faster than state-of-the-art non-differentiable techniques. And runs on a single GPU. Open source.
  2. The Spiral Language -- a functional language designed for GPUs by emphasizing inlining (GPUs don't have great stacks, so compilers have to handle subroutines carefully and differently than traditional architectures). Inlining is a trade-off that expresses the exchange of memory for computation. It should be the default instead of heap allocating.
  3. DCASE: Detection and Classification of Acoustic Scenes and Events -- workshops and a community for the researchers working on making sense of audio.
  4. Cybersecurity: Data, Statistics, and Glossaries (FAS) -- This report describes data and statistics from government, industry, and information technology (IT) security firms regarding the current state of cybersecurity threats in the United States and internationally. These include incident estimates and costs, and annual reports on data security breaches, identity thefts, cybercrimes, malware, and network securities.