Four Short Links

Nat Torkington's eclectic collection of curated links.

Four short links: 27 Feb 2017

Meta Language, PDF Extraction, Collaborative Editing, Programming Principles

  1. A Representation Language Language (PDF) -- 1980 CS paper that really should have led with the "there's no problem that can't be solved with another layer of indirection" line.
  2. pdfabextract -- tools written in Python 3 with the aim to extract tabular data from (OCR-processed) PDF files.
  3. ChainPad -- real-time collaborative editor algorithm based on Nakamoto blockchains.
  4. Id Software Programming Principles -- As soon as you see a bug, you fix it. Do not continue on. If you don’t fix your bugs your new code will be built on a buggy codebase and ensure an unstable foundation. See a snake, kill a snake.

Four short links: 24 Feb 2017

Apple DRM, Automatic Forecasting, Conversation API, and API Idempotency

  1. Apple's SSAFE DRM -- development notes from a 1979-80 anti-piracy project, discovered in an interesting fashion. (via BoingBoing)
  2. Prophet -- open source forecasting procedure implemented in Python and R. It is fast and provides completely automated forecasts that can be tuned by hand by data scientists and analysts. From Facebook. (via Sean Taylor)
  3. Perspective API -- an API that makes it easier to host better conversations. The API uses machine learning models to score the perceived impact a comment might have on a conversation. Developers and publishers can use this score to give real-time feedback to commenters or help moderators do their job, or allow readers to more easily find relevant information, as illustrated in two experiments below. We’ll be releasing more machine learning models later in the year, but our first model identifies whether a comment could be perceived as “toxic" to a discussion.
  4. Idempotency -- nothing's reliable, so it’s important to design APIs and clients that will be robust in the event of failure, and will predictably bring a complex integration to a consistent state despite them. Let’s take a look at a few ways to do that.

Four short links: 23 Feb 2017

Arduino & Pi Bundle, Encryption Primer, Travel Mode, and API Design

  1. Make Arduino and Raspberry Pi Humble Bundle -- cornucopia of great Make books.
  2. Nuts and Bolts: An Encryption Primer (Ed Felten) -- a straightforward introduction to encryption, as it is implemented in modern systems, at a level of detail suitable for policy discussions. No prior background on encryption or data security is assumed.
  3. Social Media Needs a Travel Mode (Maciej Ceglowski) -- We need a 'trip mode' for social media sites that reduces our contact list and history to a minimal subset of what the site normally offers. Not only would such a feature protect people forced to give their passwords at the border, but it would mitigate the many additional threats to privacy they face when they use their social media accounts away from home.
  4. Google API Design Guide -- a general design guide for networked APIs. It has been used inside Google since 2014 and is the guide we follow when designing Cloud APIs and other Google APIs. It is shared here to inform outside developers and to make it easier for us all to work together.

Four short links: 22 Feb 2017

Delayed Feedback, Post-Human World, APL in R, and Demand-Driven Digitized Markets

  1. Why We're Suspicious of Immediate Feedback -- Simmons and Cope (1993) found that students were more likely to use procedural strategies like trial and error in a condition of immediate feedback than a condition of delayed feedback.
  2. The Post-Human World (The Atlantic) -- interview with Yuval Harari, whose Sapiens was so damn good. My nephew and these children got into a bit of a fight because they were trying to capture the same invisible creatures. It seemed strange to me. But these Pokémon were very real to the children. And then it hit me: This is just like the Israeli-Palestinian conflict! You have two sides fighting over something that I cannot see. I look at the stones of buildings in Jerusalem and I just see stones. But Christians, Jews, and Muslims who look at the same stones see a holy city. It’s their imagination, but they are willing to kill for it. That’s virtual reality, too.
  3. APL in R -- and you thought R was hard to learn.
  4. Manifestos and Monopolies -- In this brave new world, power comes not from production, not from distribution, but from controlling consumption: all markets will be demand driven; the extent to which they already are is a function of how digitized they have become.

Four short links: 21 Feb 2017

Fact Checking, Simulated Universes, Radio Hacking, and Fly Brain Hackathon

  1. Expanding Fact Checking at Google -- We’re able to do this work because the fact check industry itself has grown. This starts the most interesting section of the post: G is part of joint projects to debunk myths around the French elections, and are funding a variety of projects around Europe.
  2. Simulated Universes (We Make Money Not Art) -- This year, a section of the exhibition of the GAMERZ festival was dedicated to the omnipresence of algorithms into our life. It was curated by artist, writer and otherwise brilliant cultural agitator Ewen Chardronnet. Summary of the different art projects, and Chardronnet's approach.
  3. Universal Radio Hacker -- software for investigating unknown wireless protocols. Features include hardware interfaces for common Software Defined Radios; easy demodulation of signals; assigning participants to keep overview of your data; customizable decodings to crack even sophisticated encodings like CC1101 data whitening; assign labels to reveal the logic of the protocol; fuzzing component to find security leaks; modulation support to inject the data back into the system.
  4. Fruit Fly Brain Hackathon 2017 -- This hackathon will feature the Fruit Fly Brain Observatory (FFBO) and its key components NeuroNLP and NeuroGFX. The former allows for exploring fruit fly brain data using plain English queries, and the latter facilitates the modeling and execution of such brain circuits. Brief tutorials will be given on the usage of the FFBO as well as developing new tools/features in FFBO.

Four short links: 20 Feb 2017

Car Security, Civ Math, Free Mindstorms, and Chinese AI Research

  1. Used Cars Still Controllable From Previous Owners' Phone -- “The car is really smart, but it’s not smart enough to know who its owner is, so it’s not smart enough to know it’s been resold,” Henderson told CNNTech. “There’s nothing on the dashboard that tells you ‘the following people have access to the car.'”
  2. Mathematics of Civilization V -- a beautiful obsession. (Theoretically beautiful. The page is full of LaTeX-rendered math and graphs, and is less than beautiful)
  3. Seymour Papert's Mindstorms, Free -- classic, available online as a PDF.
  4. China's AI Research (The Atlantic) -- Yet as the research matures in China, Ng says, it is also becoming its own distinct community. After a recent international meeting in Barcelona, he recalls seeing Chinese language write-ups of the talks circulate right away. He never found any in English. The language issue creates a kind of asymmetry: Chinese researchers usually speak English so they have the benefit of access to all the work disseminated in English. The English-speaking community, on the other hand, is much less likely to have access to work within the Chinese AI community.

Four short links: 17 February 2017

Robot Governance, Emotional Labour, Predicting Personality, and Music History

  1. Who Should Own the Robots? (Tyler Cowan) -- what is government in a world where everything is done by the robots? [...] Say there are 50 people in the government, and they allocate the federal budget subject to electoral constraints. Even a very small percentage of skim makes them fantastically wealthy, and gives them all sorts of screwy incentives to hold on to power. If they can, they’ will manipulate robot software toward that end. Designing governance for The Robot Future is definitely a Two Beer Problem.
  2. Emotional Labour for GMail -- Automate emotional labor in Gmail messages.
  3. Beyond the Words: Predicting User Personality from Heterogeneous Information -- we propose a Heterogeneous Information Ensemble framework, called HIE, to predict users’ personality traits by integrating heterogeneous information, including self-language usage, avatar, emoticon, and responsive patterns. In our framework, to improve the performance of personality prediction, we have designed different strategies extracting semantic representations to fully leverage heterogeneous information on social media. (via Adrian Colyer)
  4. Theft: A History of Music -- a graphic novel laying out a 2,000-year-long history of music, from Plato to rap. The comic is by James Boyle, Jennifer Jenkins, and the late Keith Aoki. You can buy print, or download for free.

Four short links: 16 February 2017

Memory-Busting Javascript, Taobao Villages, Drone Simulation, and Bio Bots

  1. ASLR-Busting Javascript (Ars Technica) -- modern chips randomize where your programs actually live in memory, to make it harder for someone to overwrite your code. This clever hack (in Javascript!) makes the CPU cache reveal (through faster returns) where your code is. I'm in awe.
  2. China's Taobao Villages (Quartz) -- Today, the township and its surrounding area are China’s domestic capital for one rather specific category of products: acting and dance costumes. Half of the township’s 45,000 residents produce or sell costumes—ranging from movie-villain attire to cute versions of snakes, alligators, and monkeys—that are sold on Alibaba-owned Taobao, the nation’s largest e-commerce platform.
  3. Aerial Informatics and Robotics platform (Microsoft) -- open source drone simulator.
  4. How to Build Your Own Bio Bot (Ray Kurzweil) -- researchers are sharing a protocol with engineering details for their current generation of millimeter-scale soft robotic bio-bots.

Four short links: 15 Feb 2017

Docker Data, Smart Broadcasting, Open Source, and Cellphone Spy Tools

  1. Docker Data Kit -- Connect processes into powerful data pipelines with a simple git-like filesystem interface.
  2. RedQueen: An online algorithm for smart broadcasting in social networks (Adrian Colyer) -- This paper starts out with a simple question “when’s the best time to tweet if you want to get noticed?,” detours through some math involving “solving a novel optimal control problem for a system of jump stochastic differential equations (SDEs),” and pops out again on the other side with a simple online algorithm called RedQueen.
  3. Open Source Guides -- GitHub's guide to making and contributing to open source. GitHub's is nicely packaged into visual and consumable chunks, but I still prefer (newly updated) Producing Open Source Software. The more people know how to do open source, the better.
  4. Cellphone Spy Tools Flood Local Police Departments -- caught my eye because I'm pondering visiting the U.S. this year, and I'm not a fan of surrendering devices for search. My current line of thought is: if CBP/popo are going to take a device from me and plug it into their software, hardware, and network ... it just has to look like a phone. Next challenge: making a large capacitor look like an unlocked iPhone.

Four short links: 14 Feb 2017

Rapping Neural Network, H1B Research, Quantifying Controversy, Social Media Research Tools

  1. Rapping Neural Network -- It's a neural network that has been trained on rap songs, and can use any lyrics you feed it and write a new song (it now writes word by word as opposed to line by line) that rhymes and has a flow (to an extent). With examples.
  2. H1B Research -- H1B holders are paid less and often weaker in skills compared to their American counterparts.
  3. Amazon Chime -- interesting to see a business service from Amazon, not a operations service. This is better (they claim) meeting software: move between devices, with screen-sharing, video, chat, file-sharing.
  4. Quantifying Controversy in Social Media -- The research is carried out in the context of Twitter, but in theory can be applied to any social graph structure. A topic is simply defined as a query, often a hashtag. Given a query, we can build a conversation graph with vertices representing users, and edges representing activity and interactions between users. Using a graph partitioning algorithm, we can then try to partition the graph in two. If the partitions separate cleanly, then we have a good indication that the topic is controversial and has polarized opinions.
  5. Social Media Research Toolkit -- a list of 50+ social media research tools curated by researchers at the Social Media Lab at Ted Rogers School of Management, Ryerson University. The kit features tools that have been used in peer-reviewed academic studies. Many tools are free to use and require little or no programming. Some are simple data collectors such as tweepy, a Python library for collecting Tweets, and others are a bit more robust, such as Netlytic, a multi-platform (Twitter, Facebook, and Instagram) data collector and analyzer, developed by our lab. All of the tools are confirmed available and operational.

Four short links: 13 Feb 2017

Urban Attractors, Millimetre-Scale Computing, Ship Small Code, and C++ Big Data

  1. Urban Attractors: Discovering Patterns in Regions of Attraction in Cities -- We use a hierarchical clustering algorithm to classify all places in the city by their features of attraction. We detect three types of Urban Attractors in Riyadh during the morning period: Global, which are significant places in the city, and Downtown, which are the central business district and Residential attractors. In addition, we uncover what makes these places different in terms of attraction patterns. We used a statistical significance testing approach to rigorously quantify the relationship between Points of Interests (POIs) types (services) and the three patterns of Urban Attractors we detected.
  2. Millimetre-Scale Deep Learning -- Another micro mote they presented at the ISSCC incorporates a deep-learning processor that can operate a neural network while using just 288 microwatts.
  3. Ship Small Diffs (Dan McKinley) -- your deploys should be measured in dozens of lines of code rather than hundreds. [...] In online systems, you have to ship code to prove that it works. [...] Your real problem is releasing frequently. So quotable, so good.
  4. Thrill -- distributed big data batch computations on a cluster of machines ... in C++. (via Harris Brakmic)

Four short links: 10 Feb 2017

Microsoft Graph Engine, Data Exploration, Godel Escher Bach, and Docker Secrets

  1. Microsoft Graph Engine -- open source (Windows now, Unix coming) graph data engine. It's the open source implementation of Trinity: A Distributed Graph Engine on a Memory Cloud.
  2. Superset -- AirBnB's data exploration platform designed to be visual, intuitive, and interactive now with a better SQL IDE.
  3. MIT Godel Escher Bach Lectures -- not Hofstadter himself, but a thorough walkthrough of the premises and ideas in the book.
  4. Docker Secrets Management -- interesting to see etcd getting some competition here.

Four short links: 9 February 2017

In-Memory Malware, Machine Ethics, Open Source Maintainer's Dashboard, and Cards Against Silicon Valley

  1. In-Memory Malware Infesting Banks (Ars Technica) -- According to research Kaspersky Lab plans to publish Wednesday, networks belonging to at least 140 banks and other enterprises have been infected by malware that relies on the same in-memory design [as Stuxnet] to remain nearly invisible. (via Boing Boing)
  2. Technical Challenges in Machine Ethics (Robohub) -- interesting interview with a researcher who is attempting to implement ethics in software. Fascinating to read about the approach and challenges.
  3. Scope -- nifty tool to help busy open source maintainers stay on top of their GitHub-hosted projects...dashboard for critical issues, PRs, etc.
  4. Cards Against Silicon Valley -- spot on tragicomedy.

Four short links: 8 February 2017

Becoming a Troll, Magic Paper, HTTPS Interception, and Deep NLP

  1. Anyone Can Become a Troll (PDF) -- A predictive model of trolling behavior shows that mood and discussion context together can explain trolling behavior better than an individual’s history of trolling. These results combine to suggest that ordinary people can, under the right circumstances, behave like trolls. (via Marginal Revolution)
  2. Magic Paper -- printed with light, erased with heat, and reusable up to 80 times. (via Slashdot)
  3. The Security Implication of HTTPS Interception (PDF) -- We find more than an order of magnitude more interception than previously estimated and with dramatic impact on connection security. To understand why security suffers, we investigate popular middleboxes and clientside security software, finding that nearly all reduce connection security and many introduce severe vulnerabilities. Drawing on our measurements, we conclude with a discussion on recent proposals to safely monitor HTTPS and recommendations for the security community.
  4. Deep Natural Language Processing Course -- This repository contains the lecture slides and course description for the Deep Natural Language Processing course offered in Hilary Term 2017 at the University of Oxford.

Four short links: 7 February 2017

Game Theory, Algorithms and Robotics, High School Not Enough, and RethinkDB Rises

  1. Game Theory in Practice (The Economist) -- various firms around the world offering simulations/models of scenarios like negotiations, auctions, regulation, to figure out strategies and likely courses of action from other players.
  2. Videos from the 12th Workshop on Algorithmic Foundations of Robotics -- there are plenty with titles like "non-Gaussian belief spaces" (possibly a description of modern America) but also keynotes with titles like Replicators, Transformers, and Robot Swarms.
  3. No Jobs for High School Grads (NYT) -- “In our factories, there’s a computer about every 20 or 30 feet,” said Eric Spiegel, who recently retired as president and chief executive of Siemens USA. “People on the plant floor need to be much more skilled than they were in the past. There are no jobs for high school graduates at Siemens today.”
  4. The Liberation of RethinkDB -- The Linux Foundation bought the IP after the startup wound-up, where it's now run as an open source project via the Cloud Native Computing Foundation, all with the support of the founder and community. Happy story for everyone but the investors in RethinkDB. Also worth noting: RethinkDB is in a competitive space ("NoSQL stuff") and stands out so much that real money went to rescuing it from the startup deadpool.

Four short links: 6 February 2017

NPC AI, Deep Learning Math Proofs, Amazon Antitrust, and Code is Law

  1. Building Character AI Through Machine Learning -- NPCs that learn from/imitate humans. (via Greg Borenstein)
  2. Network-Guided Proof Search -- We give experimental evidence that with a hybrid, two-phase approach, deep-learning-based guidance can significantly reduce the average number of proof search steps while increasing the number of theorems proved.
  3. Amazon's Antitrust Paradox -- This Note maps out facets of Amazon’s dominance. Doing so enables us to make sense of its business strategy, illuminates anticompetitive aspects of Amazon’s structure and conduct, and underscores deficiencies in current doctrine. The Note closes by considering two potential regimes for addressing Amazon’s power: restoring traditional antitrust and competition policy principles or applying common carrier obligations and duties. Fascinating overview of the American conception of antitrust.
  4. FBI's RAP-BACK Program -- software encodes "guilty before trial." employers enrolled in federal and state Rap Back programs receive ongoing, real-time notifications and updates about their employees’ run-ins with law enforcement, including arrests at protests and charges that do not end up in convictions.

Four short links: 3 February 2017

Stream Alerting, Probabilistic Cognition, Migrations at Scale, and Interactive Machine Learning

  1. StreamAlert -- a serverless, real-time data analysis framework that empowers you to ingest, analyze, and alert on data from any environment, using data sources and alerting logic you define. Open source from AirBnB.
  2. Probabilistic Models of Cognition -- we explore the probabilistic approach to cognitive science, which models learning and reasoning as inference in complex probabilistic models. In particular, we examine how a broad range of empirical phenomena in cognitive science (including intuitive physics, concept learning, causal reasoning, social cognition, and language understanding) can be modeled using a functional probabilistic programming language called Church.
  3. Online Migrations at Scale -- In this post, we’ll explain how we safely did one large migration of our hundreds of millions of Subscriptions objects. This is a solid process.
  4. Interactive Machine Learning (Greg Borenstein) -- intro to, and overview of, the field of Interactive Machine Learning, elucidating the principles for designing systems that let humans use these learning systems to do things they care about. In Greg's words, Machine learning has the potential to be a powerful tool for human empowerment, touching everything from how we shop to how we diagnose disease to how we communicate. To build these next thousand projects in a way that capitalizes on this potential, we need to learn not just how to teach the machines to learn but how to put the results of that learning into the hands of people.

Four short links: 2 February 2017

Physical Authentication, Crappy Robots, Immigration Game, and NN Flashcards

  1. Pervasive, Dynamic Authentication of Physical Items (ACM Queue) -- Silicon PUF circuits generate output response bits based on a silicon device's manufacturing variation. This is cute!
  2. Hebocon -- crappy robot competition.
  3. Martian Immigration Nightmare -- a game can make a point.
  4. TraiNNing Cards -- flashcards for neural networks. Hilarious!

Four short links: 1 February 2017

Unhappy Developers, Incident Report, Compliance as Code, AI Ethics

  1. Unhappy Developers -- paper authors surveyed 181 developers and built a framework of consequences: Internal Consequences, such as low cognitive performance, mental unease or disorder, low motivation; External Consequences, which might be Process-related (low productivity, delayed code, variation from the process) or Artefact-related (low-quality code, rage rm-ing the codebase). Hoping to set the ground for future research into how developer happiness affects software production.
  2. GitLab Database Incident Report -- YP thinks that perhaps pg_basebackup is being super pedantic about there being an empty data directory, decides to remove the directory. After a second or two he notices he ran it on, instead of YP died for your sins.
  3. Compliance as Code -- instead of relying on checklists and procedures and meetings, the policies and rules are enforced (and tracked) through automated controls, which are wired into configuration management tools and the Continuous Delivery pipeline. Every change ties back to version control and a ticketing system like Jira for traceability and auditability: all changes must be made under a ticket, and the ticket is automatically updated along the pipeline, from the initial request for work all the way to deployment.
  4. Ethical Considerations in AI Courses -- In this article, we provide practical case studies and links to resources for use by AI educators. We also provide concrete suggestions on how to integrate AI ethics into a general artificial intelligence course and how to teach a stand-alone artificial intelligence ethics course.

Four short links: 31 January 2017

Historic Language, Activist Security, Microcode Assembler, and PDP-10 ITS Source

  1. Computer Language We Get From the Mark I -- loop, patch, library, bug...all illustrated.
  2. Twitter Activist Security (The Grugq) -- This guide hopes to help reduce the personal risks to individuals while empowering their ability to act safely.
  3. mcasm -- microcode assembler.
  4. PDP-10 ITS -- This repository contains source code, tools, and scripts to build an ITS system from scratch. ITS is the Incompatible Timesharing System. Trivia: it's the OS that the original EMACS was written for, and the original Jargon File was written on.