Four Short Links

Nat Torkington's eclectic collection of curated links.

Four short links: 19 February 2019

3D with Face Tracking, Cleaning Data, Data as Labor, Walking Robotics

  1. Depth Index -- A JavaScript package that turns z-index into physically realistic depth, using PoseNet face tracking. Deep, man.
  2. Data Cleaner's Cookbook -- This is version 1 of a cookbook that will help you check whether a data table (defined on the data tables page) is properly structured and free from formatting errors, inconsistencies, duplicates, and other data headaches. All the data-auditing and data-cleaning recipes on this website use GNU/Linux tools in a BASH shell and work on plain text files.
  3. Should We Treat Data as Labor? Moving Beyond "Free" -- In this paper, we explore whether and how treating the market for data like a labor market could serve as a radical market that is practical in the near term.
  4. Underactuated Robotics -- working notes used for a course being taught at MIT [on] Algorithms for Walking, Running, Swimming, Flying, and Manipulation. Even if you don't care about robotics, read this excellent Hacker News comment (words I don't say often) and you'll think about walking completely differently.

Four short links: 18 February 2019

Reproducibility, Funding Open Source, Mining Pastebin, and Engineering Handbook

  1. Qresp -- a simple tool to facilitate scientific data reproducibility by making available, in a distributed manner, all data and procedures presented in scientific papers, together with metadata to render them searchable and discoverable. (via UChicago News)
  2. The Complicated Business of Open Source Funding (Vice) -- a good history and literature review, in the modern context. (via Slashdot)
  3. AIL-Framework -- a modular framework to analyze potential information leaks from unstructured data sources like pastes from Pastebin or similar services or unstructured data streams. AIL framework is flexible and can be extended to support other functionalities to mine or process sensitive information (e.g. data leak prevention).
  4. OMG: Our Machinery Guidebook -- The purpose of this guidebook is to lay down principles and guidelines for how to write code and work together at Our Machinery.

Four short links: 15 February 2019

Four Wings, Efficient Streaming Calculations, Closed AI, Quantum Research

  1. For Micro Robot Insects, Four Wings May Be Better Than Two (IEEE Spectrum) -- This robot uses the same sort of piezoelectric actuators as Harvard’s RoboBee, just rotated sideways. At 143 milligrams, it weighs just about as much as a real honeybee, but the key statistic is that it’s capable of lifting an additional 260 mg (at least), which ought to be enough for both sensors and a battery or supercapacitor. The extra power comes from the extra wings, of course, and while you can’t simply double payload capacity by doubling the number of wings, you can, hopefully, go from “not quite enough payload” to “just barely enough payload.”
  2. Computing Extremely Accurate Quantiles Using t-Digests -- We present on-line algorithms for computing approximations of rank-based statistics that give high accuracy, particularly near the tails of a distribution, with very small sketches. Notably, the method allows a quantile q to be computed with an accuracy relative to max(q,1−q) rather than absolute accuracy as with most other methods. This new algorithm is robust with respect to skewed distributions or ordered data sets and allows separately computed summaries to be combined with no loss in accuracy. (via Ellen Friedman)
  3. GPT-2: Better Language Models (OpenAI) -- their first output not released as open source because its text-generation skills are excellent. It could readily be used to make a bot army on Twitter. This indicates a change in where the line between "research best done in the open" and "giving away weapons" is drawn. These findings, combined with earlier results on synthetic imagery, audio, and video, imply that technologies are reducing the cost of generating fake content and waging disinformation campaigns. The public at large will need to become more skeptical of text they find online, just as the "deep fakes" phenomenon calls for more skepticism about images. See also The Verge's writeup.
  4. Quantum Computing, Capabilities and Limits: An Interview with Scott Aaronson (GigaOm) -- interesting and readable for the non-quantum mechanic. I think it’s too early to identify any Moore’s Law pattern. I mean, for god sakes, we don’t even know which technology is going to be the right one. The community is not converged around whether it's going to be superconducting or trapped ions or something else. You can make plots of the number of qubits and the coherence time of those qubits, and you do see a strong improvement. But the number of qubits—let’s say it’s gone up from one or two to 20; it’s kind of hard to see an exponential in those numbers.

Four short links: 14 February 2019

Learning Morality, Civilization Error Codes, Can't Unsee, and Procedural Text

  1. The Moral Choice Machine: Semantics Derived Automatically from Language Corpora Contain Human-like Moral Choices -- We create a template list of prompts and responses, which include questions such as “Should I kill people?”, “Should I murder people?”, etc., with answer templates of “Yes/no, I should (not).” The model’s bias score is now the difference between the model’s score of the positive response (“Yes, I should”) and that of the negative response (“No, I should not”). For a given choice overall, the model’s bias score is the sum of the bias scores for all question/answer templates with that choice. We ran different choices through this analysis using a Universal Sentence Encoder. Our results indicate that text corpora contain recoverable and accurate imprints of our social, ethical, and even moral choices. Our method holds promise for extracting, quantifying, and comparing sources of moral choices in culture, including technology. (via press release)
  2. Civilizational HTTP Error Codes (Gavin Starks) -- 807 STONE TABLET; CARRIER NOT SUPPORTED.
  3. Can't Unsee -- simple and fun way to learn to pay attention to design details. (via Alex Dong)
  4. Rant -- all-purpose procedural text library.

Four short links: 13 February 2019

Federated Learning, Clever-Commit, Web Design Trends, and Social Context

  1. Towards Federated Learning at Scale -- research paper from Google on a distributed machine learning approach which enables training on a large corpus of decentralized data residing on devices like mobile phones. They're working on it for Android; first app is the keyboard: Our system enables one to train a deep neural network, using TensorFlow, on data stored on the phone which will never leave the device. The weights are combined in the cloud with Federated Averaging, constructing a global model which is pushed back to phones for inference. An implementation of Secure Aggregation ensures that on a global level, individual updates from phones are uninspectable. The system has been applied in large-scale applications, for instance in the realm of a phone keyboard.
  2. Mozilla's Clever-Commit -- By combining data from the bug-tracking system and the version-control system (aka, changes in the code base), Clever-Commit uses artificial intelligence to detect patterns of programming mistakes based on the history of the development of the software. This allows us to address bugs at a stage when fixing a bug is a lot cheaper and less time consuming than upon release. Video.
  3. SaaS Web Design Trends -- everything from where the logo is to what action is being called for to the rise of custom illustrations (versus photographs).
  4. The Role of Social Context for Fake News Detection -- In this paper, we study the novel problem of exploiting social context for fake news detection. We propose a tri-relationship embedding framework TriFN, which models publisher-news relations and user-news interactions simultaneously for fake news classification. We conduct experiments on two real-world data sets, which demonstrate that the proposed approach significantly outperforms other baseline methods for fake news detection. (via Paper a Day)

Four short links: 12 February 2019

Cellphone Privacy, State Hashing, Software Optimization, and Pancreas Tech

  1. Sidewalk Labs and Cellphone Data (The Intercept) -- To make these measurements, the program gathers and de-identifies the location of cellphone users, which it obtains from unspecified third-party vendors. It then models this anonymized data in simulations—creating a synthetic population that faithfully replicates a city’s real-world patterns but that “obscures the real-world travel habits of individual people,” as Bowden told The Intercept.
  2. Zobrist Hashing -- a hash function construction used in computer programs that play abstract board games, such as chess and Go, to implement transposition tables, a special kind of hash table that is indexed by a board position and used to avoid analyzing the same position more than once.
  3. Software Optimization Resources -- the hard stuff (from my perspective higher up the stack), from C++ through assembly down to the microarchitecture of CPUs.
  4. Lighting up my DasKeyboard with Blood Sugar changes using my body's REST API (Scott Hanselman) -- However, since the keyboard has a localhost REST API and so does my blood sugar, I busted out this silly little shell script.

Four short links: 11 February 2019

Soul of a New Machine, Explaining Facts, Linux on Tesla, and Abundance Economics

  1. Reflecting on The Soul of a New Machine (Bryan Cantrill) -- re-reading the book now from start to finish has given new parts depth and meaning. Aspects that were more abstract to me as an undergraduate—from the organizational rivalries and absurdities of the industry to the complexities of West’s character and the tribulations of the team down the stretch—are now deeply evocative of concrete episodes of my own career.
  2. ExFaKT -- a framework for explaining facts over knowledge graphs and text. [...] ExFaKT uses background knowledge encoded in the form of Horn clauses to rewrite the fact in question into a set of other easier-to-spot facts.
  3. FreedomEV -- third-party Linux for your rooted Tesla.
  4. Redesigning the System -- Music is abundant; purpose is scarce.

Four short links: 8 February 2019

Data Explorer, PDP-1 in FPGA, Google's Fuzzer, and Preventing Neophilia

  1. Blazer -- Explore your data with SQL. Easily create charts and dashboards, and share them with your team.
  2. FPG-1 -- PDP-1 FPGA implementation in Verilog, with CRT, Teletype, and Console. The PDP-1 was groundbreaking: serial number 0 was delivered to the BBN offices where Licklider would see it as a way forward to his timesharing vision. From The Dream Machine: "The PDP-1 was revolutionary," Fredkin declares, still marveling four decades later. "Today such things don't happen. Today a machine comes along and is slightly faster than its competitors. But here was a machine that was off the charts. Its price performance ratio was spectacularly better than anything that had come before."
  3. ClusterFuzz -- a scalable fuzzing infrastructure that finds security and stability issues in software. See Google's announcement of the open-sourcing of it.
  4. Questions for a New Technology -- They aren’t particularly subtle in their bias. They aren’t supposed to be. They also aren’t meant to be a series of boxes to be checked or hoops to be jumped through.

Four short links: 7 February 2019

VR, Learning Robot, Bubble Sort, and Graph Neural Networks

  1. Hamlet in Virtual Reality -- context for WGBH's Hamlet 360. It's 360º video, so you can pick what you look at but not where you look at it from. Interesting work, and a reminder that we're still trying to figure out what kinds of stories these media lend themselves to, and how best to tell stories with them.
  2. Self-Taught Robot Figures Out What It Looks Like and What It Can Do -- To begin with, the robot had no idea what shape it was and behaved like an infant, moving randomly while attempting various tasks. Within about a day of intensive learning, the robot built up an internal picture of its structure and abilities. After 35 hours, the robot could grasp objects from specific locations and drop them in a receptacle with 100% accuracy. Paper is behind a paywall, though Sci-Hub has it.
  3. Bubble Sort: An Archaeological Algorithmic Analysis -- Text books, including books for general audiences, invariably mention bubble sort in discussions of elementary sorting algorithms. We trace the history of bubble sort, its popularity, and its endurance in the face of pedagogical assertions that code and algorithmic examples used in early courses should be of high quality and adhere to established best practices. This paper is more an historical analysis than a philosophical treatise for the exclusion of bubble sort from books and courses. However, sentiments for exclusion are supported by Knuth: "In short, the bubble sort seems to have nothing to recommend it, except a catchy name and the fact that it leads to some interesting theoretical problems." Although bubble sort may not be a best practice sort, perhaps the weight of history is more than enough to compensate and provide for its longevity.
  4. Comprehensive Survey on Graph Neural Networks -- We propose a new taxonomy to divide the state-of-the-art graph neural networks into different categories. With a focus on graph convolutional networks, we review alternative architectures that have recently been developed; these learning paradigms include graph attention networks, graph autoencoders, graph generative networks, and graph spatial-temporal networks. We further discuss the applications of graph neural networks across various domains and summarize the open source codes and benchmarks of the existing algorithms on different learning tasks. Finally, we propose potential research directions in this fast-growing field.

Four short links: 6 February 2019

Video Editing, Assembling Textbooks, Amazon Advertising, and Blocking Autoplay

  1. Flowblade -- a multitrack non-linear video editor released under GPL3 license.
  2. Automatically Assembling Textbooks from Wikipedia -- Adamti and co have a plan for determining the utility of their approach. They plan to produce a range of Wikibooks on subjects not yet covered by human-generated books. They will then monitor the page views and edits to these books to see how popular they become and how heavily they are edited, compared with human-generated books.
  3. Amazon Knows What You Buy. And It’s Building a Big Ad Business From It (NYT) -- I'm sure nothing bad can happen from this.
  4. Firefox 66 to Block Automatically Playing Audible Video and Audio (Mozilla) -- user-friendly behavior ftw.

Four short links: 5 February 2019

Creating the Future, LIDAR, Human-AI Design, and Command-line Course

  1. The Best Way to Predict the Future is to Create It. But Is It Already Too Late? (Alan Kay) -- Virtually everybody in the computing science has almost no sense of human history and context of where we are and where we are going. So, I think of much of the stuff that has been done as inverse vandalism. Inverse vandalism is making things just because you can. Every sentence is a cracker. (via Daniel G. Siegel)
  2. Trying to Make Powerful, Low-cost LIDAR (Ars Technica) -- a good intro to the tech and competition in the space.
  3. Guidelines for Human-AI Interaction -- Microsoft paper on design challenges in "smart" apps.
  4. MIT Hacker Tools -- lectures on the Unix tools that command-line natives use.

Four short links: 4 February 2019

Information Theory, Event Sourcing, Sunsetting Software, and Social Perception

  1. A Mini-Introduction To Information Theory -- This article consists of a very short introduction to classical and quantum information theory. Basic properties of the classical Shannon entropy and the quantum von Neumann entropy are described, along with related concepts such as classical and quantum relative entropy, conditional entropy, and mutual information. A few more detailed topics are considered in the quantum case.
  2. Event Sourcing is Hard (Chris Kiehl) -- In practice, this manages to somehow simultaneously be both extremely coupled and yet excruciatingly opaque.
  3. Executing a Sunset (Etsy) -- In this blog post, we will explore how we sunset these products at Etsy. This process involves a host of stakeholders, including marketing, product, customer support, finance, and many other teams, but the focus of this blog post is on engineering and the actual execution of the sunset.
  4. Social Perception for Machines -- a lecture by CMU's Yaser Ajmal Sheikh. In this talk, I will describe our research arc over the past decade at CMU to make human signaling a perceptible channel of information for machines.

Four short links: 1 February 2019

GPU Analytics, 8-Bit Coding, Evil HCI, and CGI for Websockets

  1. AresDB -- Uber’s GPU-powered open source, real-time analytics engine.
  2. 8 Bit Workshop -- Learn how classic game hardware worked. Write code and see it run instantly. In your browser.
  3. CHI4Evil -- In this workshop, we will explore the creative use of HCI methods and concepts such as design fiction or speculative design to help anticipate and reflect on the potential downsides of our technology design, research, and implementation. Call for papers. Channel your inner Black Mirror. (via BoingBoing)
  4. websocketd -- CGI for WebSockets.

Four short links: 31 January 2019

Locke the Thinkfluencer, Open Source Semiconductor Manufacturing, AR/VR, and IT's Recycling Shame

  1. Cory Doctorow at Grand Reopening of the Public Domain -- Locke was a thinkfluencer. No transcript yet, but audio ripped on the Internet Archive.
  2. Libre Silicon -- We develop a free and open source semiconductor manufacturing process standard and provide a quick, easy, and inexpensive way for manufacturing. No NDAs will be required anywhere to get started, making it possible to build the designs in your basement if you wish. We are aiming to revolutionize the market by breaking through the monopoly of proprietary closed-source manufacturers.
  3. Predicting Visual Discomfort with Stereo Displays -- In a third experiment, we measured phoria and the zone of clear single binocular vision, which are clinical measurements commonly associated with correcting refractive error. Those measurements predicted susceptibility to discomfort in the first two experiments. A simple predictor of whether and when you're going to puke with an AR/VR headset would be a wonderful thing. Perception of synthetic realities are weird: a friend told me about encountering a bug in a VR renderer that made him immediately (a) fall over, and (b) puke. Core dumped?
  4. A New Circular Vision for Electronics (World Economic Forum) -- getting coverage because it says: Each year, close to 50 million tonnes of electronic and electrical waste (e-waste) are produced, equivalent in weight to all commercial aircraft ever built; only 20% is formally recycled. If nothing is done, the amount of waste will more than double by 2050, to 120 million tonnes annually. [...] That same e-waste represents a huge opportunity. The material value alone is worth $62.5 billion (€55 billion), three times more than the annual output of the world’s silver mines and more than the GDP of most countries. There is 100 times more gold in a tonne of mobile phones than in a tonne of gold ore. (via Slashdot)

Four short links: 30 January 2019

No Code, Enterprise Sales, Deep-Learning the Brain, and Computer Architecture

  1. The Rise of No Code -- As creating things on the internet becomes more accessible, more people will become makers. It’s no longer limited to the >1% of engineers who can code, resulting in an explosion of ideas from all kinds of people. We see “no code” projects on Product Hunt often. This is related to my ongoing interest in Ways In Which Programmers Are Automating Themselves Out of A Job. This might be bad for some low-complexity programmers in the short term, and good for society. Or it might be that the AI Apocalypse is triggered by someone's Glitch bot achieving sentience. Watch this space!
  2. My Losing Battle with Enterprise Sales (Luke Kanies) -- All that discounting you have to do for enterprise clients? It’s because procurement’s bonus is based on how much of a discount they force you to give. Absolutely everyone knows this is how it works, and that everyone knows this, so it’s just a game. I offer my product for a huge price, you try to force a discount, and then at the end we all compare notes to see how we did relative to market. Neither of us really wants to be too far out of spec; I want to keep my average prices the same, and you just want to be sure you aren’t paying too much. Luke tells all.
  3. Decoding Words from Brain Waves -- In each study, electrodes placed directly on the brain recorded neural activity while brain-surgery patients listened to speech or read words out loud. Then, researchers tried to figure out what the patients were hearing or saying. In each case, researchers were able to convert the brain's electrical activity into at least somewhat-intelligible sound files.
  4. A New Golden Age for Computer Architecture (ACM) -- the opportunities for future improvements in speed and energy efficiency will come from (the authors predict): compiler tech and domain-specific architectures. This is a very good overview of how we got here, by way of Moore's Law, Dennard's Law, and Amdahl's Law.

Four short links: 29 January 2019

Git Tool, Linear Algebra, Steganography, and WebAssembly

  1. git-absorb -- git commit --fixup, but automatic.
  2. Coding the Matrix -- linear algebra was where math broke me at university, so my eyes are always drawn to presentations of the subject that promise relevance and comprehensibility. (via Academic Torrents)
  3. A List of Useful Steganography Tools and Resources -- what it says on the box.
  4. Analyzing the Performance of WebAssembly vs. Native Code -- Across the SPEC CPU suite of benchmarks, we find a substantial performance gap: applications compiled to WebAssembly run slower by an average of 50% (Firefox) to 89% (Chrome), with peak slowdowns of 2.6x (Firefox) and 3.14x (Chrome). We identify the causes of this performance degradation, some of which are due to missing optimizations and code generation issues, while others are inherent to the WebAssembly platform.

Four short links: 28 January 2019

Medical AI, Opinion Mapping, Voting-Free Democracy, and a Typed Graph Database

  1. AI Helps Amputees Walk With a Robotic Knee (IEEE) -- Normally, human technicians spend hours working with amputees to manually adjust robotic limbs to work well with each person’s style of walking. By comparison, the reinforcement learning technique automatically tuned a robotic knee, enabling the prosthetic wearers to walk smoothly on level ground within 10 minutes.
  2. Penelope -- a cloud-based, open, and modular platform that consists of tools and techniques for mapping landscapes of opinions expressed in online (social) media. The platform is used for analyzing the opinions that dominate the debate on certain crucial social issues, such as immigration, climate change, and national identity. Penelope is part of the H2020 EU project ODYCCEUS (Opinion Dynamics and Cultural Conflict in European Spaces).
  3. What MMOs Can Teach Us About Real-Life Politics -- Larry Lessig is designing the political mechanics for a videogame, and this interview is very intriguing. Lessig is also interested in possibly implementing an in-game process in which democracy doesn’t depend on voting: “I’m eager to experiment or enable the experimentation of systems that don’t need to be tied so much to election.” (via BoingBoing)
  4. The AtomSpace: a Typed Graphical Distributed in-RAM Knowledgebase (OpenCog) -- Here’s my sales pitch: you want a graph database with a sophisticated type system built into it. Maybe you don’t know this yet. But you do. You will. You’ll have trouble doing anything reasonable with your knowledge (like reasoning, inferencing, and learning) if you don’t. This is why the OpenCog AtomSpace is a graph database, with types.

Four short links: 25 January 2019

IT Failures, Paradigms, AI Governance, Quantum Hokum

  1. Biggest IT Failures of 2018 (IEEE) -- a coding error with the spot-welding robots at Subaru’s Indiana Automotive plant in Lafayette, Ind., meant 293 of its new Subaru Ascents had to be sent to the car crusher. A similar problem is suspected as the reason behind the welding problems affecting the steering on Fiat Chrysler Jeep Wranglers. This is not the "crushing it" that brogrammers intended.
  2. Programming Paradigms for Dummies: What Every Programmer Should Know -- This chapter gives an introduction to all the main programming paradigms, their underlying concepts, and the relationships between them. We give a broad view to help programmers choose the right concepts they need to solve the problems at hand. We give a taxonomy of almost 30 useful programming paradigms and how they are related. Most of them differ only in one or a few concepts, but this can make a world of difference in programming. (via Adrian Colyer)
  3. Proposed Model Governance -- Singapore Government's work on regulating AI.
  4. Talent Shortage in Quantum Computing (MIT) -- an argument that we need special training for quantum computing, as it's a mix of engineering and science at this stage in its evolution. This chap would disagree, colorfully: when a subject which claims to be a technology, which lacks even the rudiments of experiment that may one day make it into a technology, you can know with absolute certainty that this "technology" is total nonsense. That was the politest quote I could make.

Four short links: 24 January 2019

Computational Periscopy, Automating Data Structures, Multi-Stream Processing, and Open Source Bioinstruments

  1. Computational Periscopy with an Ordinary Camera (Nature) -- Here we introduce a two-dimensional computational periscopy technique that requires only a single photograph captured with an ordinary digital camera. Our technique recovers the position of an opaque object and the scene behind (but not completely obscured by) the object, when both the object and scene are outside the line of sight of the camera, without requiring controlled or time-varying illumination. Such recovery is based on the visible penumbra of the opaque object having a linear dependence on the hidden scene that can be modeled through ray optics. Computation and vision, whether deep learning or this kind of mathematical witchcraft, has brought about an age of truly amazing advances. Digital cameras are going to make film cameras look like pinhole cameras because the digital feature set will be staggering. (All requiring computational power, on- or off-device)
  2. The Data Calculator: Data Structure Design and Cost Synthesis From First Principles, and Learned Cost Models -- We present a design engine, the Data Calculator, which enables interactive and semi-automated design of data structures. It brings two innovations. First, it offers a set of fine-grained design primitives that capture the first principles of data layout design: how data structure nodes lay out data, and how they are positioned relative to each other. This allows for a structured description of the universe of possible data structure designs that can be synthesized as combinations of those primitives. The second innovation is computation of performance using learned cost models. I'm always interested in augmentation for programmers. (via Adrian Colyer)
  3. Confluo (Berkeley) -- open source system for real-time distributed analysis of multiple data streams. Confluo simultaneously supports high throughput concurrent writes, online queries at millisecond timescales, and CPU-efficient ad hoc queries via a combination of data structures carefully designed for the specialized case of multiple data streams, and an end-to-end optimized system design. The home page has more information. Designing for multiple data streams is an interesting architectural choice. Any interesting business will track multiple data streams, but will they do that in one system or bolt together multiple?
  4. Open-Sourcing Bioinstruments -- story of the poseidon syringe pump system, which has free hardware designs and software.

Four short links: 23 January 2019

NLP, Verified Software, LiveJournal, and Personal CRM

  1. Zero-Shot Transfer Across 93 Languages (Facebook) -- we have significantly expanded and enhanced our LASER (Language-Agnostic SEntence Representations) toolkit. We are now open-sourcing our work, making LASER the first successful exploration of massively multilingual sentence representations to be shared publicly with the NLP community. The toolkit now works with more than 90 languages, written in 28 different alphabets.
  2. Formally Verified Software in the Real World (CACM) -- This was not the first autonomous flight of the AH-6, dubbed the Unmanned Little Bird (ULB); it had been doing them for years. This time, however, the aircraft was subjected to mid-flight cyber attacks. The central mission computer was attacked by rogue camera software as well as by a virus delivered through a compromised USB stick that had been inserted during maintenance. The attack compromised some subsystems but could not affect the safe operation of the aircraft.
  3. The Linux of Social Media: How LiveJournal Pioneered Then Lost Web Blogging -- “We were always saying we were fighting for the users, that we would run everything by the community before we did anything,” says Mark Smith, a software engineer who worked on LiveJournal and became the co-creator of Dreamwidth. “Well, as it turns out, when you do that, you end up with the community telling you they want everything to stay the same, forever."
  4. Monica -- open source personal CRM. Monica helps you organize the social interactions with your loved ones.