Four Short Links

Nat Torkington’s eclectic collection of curated links.

Four Short Links

Four short links: 2 October 2020

Memristors, USB-C, Unfck the Internet, and Misinformation

By Nat Torkington
  1. Single Device Behaves Like a NeuronOn its own, using a simple DC voltage as the input, the device outputs not just simple spikes, as some other devices can manage, but the whole array of neural activity—bursts of spikes, self-sustained oscillations, and other stuff that goes on in your brain. (Paper)
  2. USB-C Is a Total Mess — Different power standards, differing video standards, and no way for a person to look at a cable or a connector and know what it can do. “The great thing about standards is that there’s so many to choose from.”
  3. Unfck The Internet — Mozilla’s new campaign makes sense to me, but I can’t say it makes sense to launch during a pandemic election …
  4. How Civil Society Can Combat Misinformation and Hate Speech Without Making It Worse — Good suggestions, backed by research. The six strategies for countering misinformation and hate speech: connected communities; the Truth Sandwich; pre-bunking; distributed debunking; localize the context; humor over rumor.

Four short links: 29 Sep 2020

IOT Security, Scale vs Experience, Social Bots, and Distributed Consensus

By Nat Torkington
  1. When Coffeemakers Demand Ransom So he then examined the mechanism the coffee maker used to receive firmware updates. It turned out they were received from the phone with—you guessed it—no encryption, no authentication, and no code signing. Nothing remarkable here other than it’s 2020 and companies still put crappy software into their hardware.
  2. Returns to Scale vs ExperienceBig machines are sometimes more efficient. But they cost more, so fewer can be produced with a finite budget. Small machines are cheaper and may benefit from improvement over time, driven by experience in building more units. When does this experience lead to greater overall efficiency? We derive an approximation which, given a learning rate, tells how much smaller a machine must be to overcome an initial efficiency disadvantage.
  3. Ten Years of Studying Social BotsThe first work that specifically addressed the detection of automated accounts in online social networks dates back to January 2010. A good history, courtesy the ACM.
  4. CRDTs are the Future — A love note to CRDTs from someone who worked on Wave.

Four short links: 25 September 2020

Incremental Computing, Migration Lessons, Microsoft GPT-3, and Glitch

By Nat Torkington
  1. AdaptonA program P is incremental if repeating P with a changed input is faster than from-scratch computation. Adapton offers programming language abstractions for incremental computation.
  2. Migration Lessons LearnedKeep your migration scripts away from your production code; Keep it low-tech, don’t deserialize; Write tests to exercise each migration script individually; Consider running long migrations online; Consider versioning your documents.
  3. Microsoft Exclusive License to GPT-3 — If you’re selling compute, the logical complement is a clever system that sucks compute. I assume that’s why Oracle now have a slice of Tik-Tok. Capitalism is weird.
  4. PNGlitchHowever, we do not look at image formats from a general point of view, but rather think of ways to glitch them. When we look at PNG from the point of view of glitch, what kind of peculiarity does it have?

Four short links: 18 Sep 2020

Contemporary Issues in Computer Science, The Hardware Lottery, Hope, and Culture

By Nat Torkington
  1. CS349 – Contemporary Issues in Computer ScienceThis class examines ethical frameworks, modern ethical concerns related to computer science and technology, and clear oral and written communication. Topics we will explore include policy vacuums created by new technology, copyright and patent, software bugs and liability, freedom of speech, privacy, security, employment and job markets, warfare and state-building, wealth discrepancy and consumerism, environmental impact, and changing cultural norms and social contracts. Wonderful to see this content being tackled in universities.
  2. The Hardware LotteryWhat follows is part position paper and part ahistorical review. This essay introduces the term hardware lottery to describe when a research idea wins because it is compatible with available software and hardware and not because the idea is superior to alternative research directions. We argue that choices about software and hardware have often played a decisive role in deciding the winners and losers in early computer science history.
  3. Hopea Bitsy game about learning to rely on others and fighting against hopelessness, together. I look for software that makes people and societies stronger, rather than weaker.
  4. Your Values are the Rules You BreakIf you are writing down “rules” and insisting that developers abide by them, it’s probably because your developers are continuously doing things you wish they wouldn’t. Usually, this isn’t because your developers don’t understand “the rules” and/or don’t like you—it’s because they know what the organization values, and those values are in conflict with your “rules,” and they’re trying to deliver that value.

Four short links: 16 Sep 2020

Concurrency, Quadruped Robot, Ethics Groups, and Threat Models for Differential Privacy

By Nat Torkington
  1. A Concurrency Cost Hierarchya higher level taxonomy that I use to think about concurrent performance. We’ll group the performance of concurrent operations into six broad levels running from fast to slow, with each level differing from its neighbors by roughly an order of magnitude in performance. They are: Contended Atomics, System Calls, Implied Context Switch, Catastrophe, Uncontended Atomics, Vanilla Instructions.
  2. Open Source Quadruped Robot — Now with a robotic arm.
  3. AI Ethics Groupswithout more geographic representation, they’ll produce a global vision for AI ethics that reflects the perspectives of people in only a few regions of the world, particularly North America and northwestern Europe. […] This lack of regional diversity reflects the current concentration of AI research (pdf): 86% of papers published at AI conferences in 2018 were attributed to authors in East Asia, North America, or Europe. And fewer than 10% of references listed in AI papers published in these regions are to papers from another region. Patents are also highly concentrated: 51% of AI patents published in 2018 were attributed to North America.
  4. Threat Models for Differential Privacy — Looks at risks around central, local, and hybrid models of differential privacy. Good insight and useful conclusions, e.g. As a result, the local model is only useful for queries with a very strong “signal.” Apple’s system, for example, uses the local model to estimate the popularity of emojis, but the results are only useful for the most popular emojis (i.e. where the “signal” is strongest). The local model is typically not used for more complex queries, like those used in the U.S. Census [3] or applications like machine learning.

Four short links: 11 Sep 2020

Lipsync, Workflow, Knowledge Half-Life, and Security Engineering

By Nat Torkington
  1. Accurately Lipsync Video to Any SpeechIn our paper, A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild, ACM Multimedia 2020, we aim to lip-sync unconstrained videos in the wild to any desired target speech. (Paper) Impressive.
  2. TemporalOpen source “workflow-as-code” engine. I can’t decide if this is awful or brilliant.
  3. The Rate of Obsolescence of Knowledge in Software Engineering — When I graduated, I was told the half-life for what I’d learned was 18 months. But not all knowledge is equivalent, as the author points out. The anatomy of an OS class is still relevant today. It’s interesting to look at the knowledge you struggle to acquire and ask yourself what its half-life will be.
  4. Security Engineering (3ed) — Drafts of the third edition, to be released in December, are available online but may go away. (via Bruce Schneier)

Four short links: 9 Sep 2020

Software Engineering, Architects and Plumbers, Pair Programming with AI, and Comments

By Nat Torkington
  1. Things I Learned to Become a Senior Software Engineer — Full of relatable growth moments, such as changing your code to make the test pass vs understanding why the test failed.
  2. The Future is Software Engineers Who Can’t Code“There are lot of definitions of what a developer is […] It’s not just people who write code.” […] Microsoft has even given these “civilian” programmers a persona: Mort. […] The fictional “Mort” is a skilled professional, anyone from a business analyst to a construction site cost estimator, who needs computers to perform specific functions without mastering the intricacies of full-blown programming. As Mel Conway called it, the profession is bifurcating into architects and plumbers. Architects make complex pieces of software, plumbers bolt those pieces together.
  3. Pair Programming with AI — This makes sense to me: We don’t need complete, perfect solutions; we need partial solutions in situations where we don’t have all the information, and we need the ability to explore those solutions with an (artificially) intelligent partner.
  4. Writing System Software: Code Comments — Absolutely the best thing on software engineering that a software engineer will read all month. This is GOLD.

Four short links: 4 September 2020

Hardware Teardown, The Incredible Proof Machine, Better Programming, and ArangoDB

By Nat Torkington
  1. Inside the Digital Pregnancy Test — … is a paper pregnancy test and watch-battery-powered microcontroller connected to three LEDs, a photo-cell, and an LCD display. That (8-bit) microcontroller runs at 4MHz, almost as fast as an IBM PC did.
  2. The Incredible Proof Machine — Fun game (modelled on The Incredible Machine from the 90s) that teaches logic.
  3. Make Interfaces Hard to MisuseDon’t push the responsibility of maintaining invariants required by your class on to its callers. Excellent advice.
  4. ArangoDB a scalable open-source multi-model database natively supporting graph, document and search. All supported data models & access patterns can be comnbined in queries allowing for maximal flexibility.

Four short links: 2 September 2020

Debug Visualizer, Userland, Minglr, and Declarative Logic in Rust

By Nat Torkington
  1. VSCode Debug VisualizerA VS Code extension for visualizing data structures while debugging. Like the VS Code’s watch view, but with rich visualizations of the watched value. The screencast is wow.
  2. Userlandan integrated dataflow environment for end-users. It allows users to interact with modules that implement functionality for different domains from a single user interface and combine these modules in creative ways. The talk shows it in action. It’s a spreadsheet and cells can be like a spreadsheet, or can be like a Unix shell, or can be an audio synthesizer (!).
  3. Minglr — Open source software (built on Jitsi) that facilitates the ad hoc mingling that might happen in the audience after a talk ends: see who’s there, pick people to talk to, talk to them. Interesting to see the floresence of social software thanks to lockdown.
  4. Crepea library that allows you to write declarative logic programs in Rust, with a Datalog-like syntax. The Reachable example is sweet. From initial testing, the generated code is very fast. Variants of transitive closure for large graphs (~1000 nodes) run at comparable speed to compiled Souffle, and at a fraction of the compilation time.

Four short links: 28 August 2020

Activity Watch, Natural Language Database Queries, Web Assembly, and Sci-Fi UI

By Nat Torkington
  1. Activity Watch — Open source, privacy-first, cross-platform app that automatically tracks how you spend time on your devices.
  2. Natural Language Database Queries — An interesting comment thread on Hacker News. Sample comments: I’ve done some previous digging into natural language SQL queries — there’s a good amount of research around this. But the error rate is always too high for useful production deployment and the systems I’ve looked at never handled ambiguity well. The target user for this is someone who knows nothing about SQL so ambiguity is guaranteed. and I worked for Cleargraph, a startup which built a natural language query layer on top of RDBMSes and which we sold to Tableau. We reached the same conclusions as you: that such a system must properly handle ambiguous queries, and users need to explicitly understand the results they’re viewing.
  3. Hands-On Web Assembly — Simple tutorial for Wasm.
  4. ArwesFuturistic Sci-Fi and Cyberpunk Graphical User Interface Framework for Web Apps. Click through, it’s worth it.

Four short links: 25 August 2020

Wooden Turing Machine, Burnout, Ethics, and Entity Framework for Go

By Nat Torkington
  1. Wooden Turing Machine — (YouTube) Description of how it works. It implements three data elements and two states, sufficient for any calculation (discussed here)> Not an infinite tape, because … wood.
  2. Emotional Resiliency in Leadership Report 2020 — Very interesting report, based on survey and science of burnout. It is primarily written for the survey respondents, and anyone dealing with burn-out and resilience issues either in themselves, family members and employees. If you’re only interested in how to address burn-out skip to section seven.
  3. Three Things Digital Ethics Can Learn From Medical EthicsEthics committees have at least three roles to play. The first is education. […] The second role of ethics committees is policy formation and review. […] The third role of ethics committees is to provide ethical consultation. and [T]echnological decisions are not only about facts (for example, about what is more efficient), but also about the kind of life we want and the kind of society we strive to build.
  4. Ent: An Entity Framework for GoSimple, yet powerful ORM for modeling and querying data. (a) Schema As Code – model any database schema as Go objects. (b) Easily Traverse Any Graph – run queries, aggregations and traverse any graph structure easily. (c) Statically Typed And Explicit API – 100% statically typed and explicit API using code generation. (d) Multi Storage Driver – supports MySQL, PostgreSQL, SQLite and Gremlin. Built by Facebookers, inspired by an FB-internal entity framework.

Four short links: 21 August 2020

Data Bugs, Fairness in Machine Learning, Social Architecture, and Developer Advice

By Nat Torkington
  1. The 212 Story Tower That Isn’t in Suburban Melbourne — A typo in a Open Street Map submission becomes a surprising monolith in Microsoft Flight Simulator.
  2. Fairness in Machine Learning — Draft text for a book on the subject.
  3. The Social Architecture of Impactful Communities — A really good set of models for communities. Individuals typically “hire” communities to accomplish transitions that require human connection. The major sections: Why do people join communities?; Member quality determines community success; Design your community to spark quality interactions; The two levels of group cohesion; Recognizing and retaining key members; Growing your ranks; A Time to Build.
  4. Letters to a New Developer — A series of articles of advice to early-stage programmers, such as Don’t Try to Change the Tabbing/Bracing Style and On Debugging.

Four Short Links: 19 August 2020

Notebook Design, Decision Records, Computational Embroidery, and Neuromorphic Chips

By Nat Torkington
  1. The Design Space of Computational Notebooks — Looked at 60 notebook systems and grouped 10 design space dimensions into four major stages of a data science workflow: importing data into notebooks (data sources), editing code and prose (editor style, supported programming languages, versioning, collaboration), running code to generate outputs (cell execution order, liveness [6], execution environment, and cell outputs), and publishing notebook outputs.
  2. Architecture Decision Records — The why’s and how’s of documentating architecture decisions. Future you will thank currently-present-you-but-past-you-by-the-time-it-is-useful.
  3. PEmbroideran open library for computational embroidery with Processing.
  4. Neuromorphic Chips Take ShapeNeuromorphic chips are packed with artificial neurons and artificial synapses that mimic the activity spikes that occur within the human brain—and they handle all this processing on the chip. This results in smarter, far more energy-efficient computing systems. Outline of the what and why, with a few examples.

Four short links: 14 August 2020

Endpoint Security, Info from Invoices, Trusting Data, and ARGs and Conspiracy Theories

By Nat Torkington
  1. SinterSinter uses the user-mode EndpointSecurity API to subscribe to and receive authorization callbacks from the macOS kernel, for a set of security-relevant event types. The current version of Sinter supports allowing/denying process executions; in future versions we intend to support other types of events such as file, socket, and kernel events. Inspired by Google Santa (Santa because it decides if executables are naughty or nice), but aiming to vet more than executables.
  2. Extracting Info from Invoices — Turns out this is a double-hard problem: hard to get the algorithm good, and hard to get training sets. Info extraction datasets are, well, full of information. And if the info you want to extract is financial, or personally-identifying, or otherwise sensitive, then there aren’t generally freely-available training sets. There is no training dataset for invoices.
  3. Why is Science Hard for People to Trust — An interesting set of ideas, but these sentences have been echoing around my head: We hate being wronged, and it makes us vengeful. On the other hand, we don’t necessarily love being “done right by,” and we don’t have a particular motivation that comes from it. There’s no “positive” version of revenge. I wonder how this changes social software design.
  4. What ARGs Can Teach Us About QAnon — (Adrian Hon) A very thoughtful comparison between ARGs and conspiracy theories. These are useful steps but will not stop QAnon from spreading in social media comments or private chat groups or unmoderated forums. It’s not something we can reasonably hope for, and I don’t think there’s any technological solution (e.g. browser extensions) either. The only way to stop people from mistaking speculation from fact is for them to want to stop.

Four short links: 11 Aug 2020

Immutable Database, Smart Mask, CyberCode Game, and Python Security Auditing

By Nat Torkington
  1. ImmuDBlightweight, high-speed immutable database for systems and applications. Open Source and easy to integrate into any existing application. Latest version provides multitenancy.
  2. Smart Mask — (CNN) Japanese startup Donut Robotics […] created a smart mask — a high-tech upgrade to standard face coverings, designed to make communication and social distancing easier. In conjunction with an app, the C-Face Smart mask can transcribe dictation, amplify the wearer’s voice, and translate speech into eight different languages. Masks are the latest wearables.
  3. CyberCode Onlinea Cyber Punk inspired, Text Based MMORPG Browser Game where gameplay interfaces are ‘Stealthily’ mimicking the VSCode interface. VSCode has such huge mindshare, people are copying its interface for games.
  4. pysa — Facebook’s static analysis tool for finding security problems in Python code. It’ll find data flow problems: Pysa performs iterative rounds of analysis to build summaries to determine which functions return data from a source and which functions have parameters that eventually reach a sink. If Pysa finds that a source eventually connects to a sink, it reports an issue. SQL injections are where data from a web form eventually makes it to a SQL request.

Four short links: 7 Aug 2020

Operations Research, Shell and Programming Language, Knowledge Graphs, and Visual Programming

By Nat Torkington
  1. Surprising Economics of Load-Balanced SystemsI have a system with c servers, each of which can only handle a single concurrent request, and has no internal queuing. The servers sit behind a load balancer, which contains an infinite queue. An unlimited number of clients offer c * 0.8 requests per second to the load balancer on average. In other words, we increase the offered load linearly with c to keep the per-server load constant. Once a request arrives at a server, it takes one second to process, on average. How does the client-observed mean request team vary with c?
  2. CrushCrush is an attempt to make a traditional command line shell that is also a modern programming language. It has the features one would expect from a modern programming language like a type system, closures and lexical scoping, but with a syntax geared toward both batch and interactive shell usage. I’m not convinced this is where programming belongs, but some of the examples are shell power-user dreams.
  3. Deep Graph Learning Knowledge EmbeddingDGL-KE is a high performance, easy-to-use, and scalable package for learning large-scale knowledge graph embeddings. The package is implemented on the top of Deep Graph Library (DGL) and developers can run DGL-KE on CPU machine, GPU machine, as well as clusters. Open source, from Microsoft. (via Amazon Science)
  4. Flume — Open source React library to create your own visual programming language (drag-and-drop function nodes with connectors between them).

Four short links: 5 August 2020

Autistic Developer, NyanSat, Real-World CRDTs, and Emergent Behaviour

By Nat Torkington
  1. Tales of the Autistic Developer – Senior Developer — Interesting article from a senior developer with Autism Spectrum Disorder. They talk about the problems it causes, how they solved each problem, and how those problems even became strengths. You can’t read this and not have more empathy for neurodiverse programmers.
  2. NyanSat: The Open Source LEO Satellite Tracker — (Wired) — With one of the devices up and running, you can point NyanSat’s antenna to specific coordinates in the sky and listen for the radio frequency transmissions coming from a satellite that’s out there.
  3. PushPin: Towards Production-Quality Peer-to-Peer Collaboration — (PDF) If you want to write multiplayer (real-time collaborative) software like Google Docs, you need to use Conflict-Free Replicated Data Types (CRDTs). They’re not always easy and obvious in the real world, and this paper describes the author’s experiences using CRDTs to build a local-first collaborative app. They describe the architecture, problems, and solutions. It’s a good starting point for anyone who has dreams of building such apps.
  4. Emergent Behaviour in Balloon Navigation — (Medium) Machine learning sits at the heart of balloon navigation in Google’s Project Loon. This article is about the unprogrammed behaviours such as tacking, loitering, and figure 8s, and why they emerge (with animations).

Four short links: 31 July 2020

SQL Migration, Hypertext Framework, Scaling RDBMS, and Account Recovery

By Nat Torkington
  1. Migrating a 40TB SQL Server Database — A horror story to tell around the campfire. I was struck by the observation in this Hacker News comment that It’s typical that a log will be accessed zero times. Collecting, aggregating, and indexing logs is usually a mistake made by people who aren’t clear on the use case for the logs.
  2. Hyperappan ultra-lightweight Virtual DOM, highly-optimized diff algorithm, and state management library obsessed with minimalism.
  3. Scaling Relational DatabasesUpdate the database; scale vertically; leverage application code; use efficient data types; data normalization and denormalization; precompute data; leverage materialized views; use proper indexes; leverage the execution plan for query optimization; choose correct transaction isolation level; bulk INSERTs and UPDATEs; compress data for storage; make ALTER TABLEs work; manage concurrent connections; add read replicas; disk partitioning; use specialized extensions; sharding; don’t store everything in one table; process data outside the SQL database; be aware of the limitations of managed SQL databases.
  4. Secrets, Lies, and Account Recovery: Lessons from the Use of Personal Knowledge Questions at Google We conclude that it appears next to impossible to find secret questions that are both secure and memorable. Secret questions continue have some use when combined with other signals, but they should not be used alone and best practice should favor more reliable alternatives.