Four Short Links

Nat Torkington’s eclectic collection of curated links.

Four Short Links

Four short links: 22 January 2020

Unending Projects, Work/Life Game, Software Characterization, Team Dynamics

By Nat Torkington
  1. Elements of Scheduling — notable for several things, but my eye was caught by: finite convergence to completion fell beyond our reach. I know that state.
  2. Dungeons and Deadlines — a game of work/life balance.
  3. Microsoft Application Inspector — open source software characterization source code analyzer that helps you understand what a program does by identifying interesting features and characteristics using static analysis and a customizable json-based rules engine.
  4. Understanding Team DynamicsWe find that highly successful teams are significantly more focused than average teams of the same size, that their members have worked on more diverse sets of projects, and the members of highly successful teams are more likely to be core members or “leads” of other teams.

Four short links: 21 January 2020

Network Visualization, Computational Notebooks, Computing History, and Preserving Privacy

By Nat Torkington
  1. Cytoscapean open source software platform for visualizing complex networks and integrating these with any type of attribute data.
  2. What’s Wrong with Computational Notebooks? Pain Points, Needs, and Design OpportunitiesOur findings suggest that data scientists face numerous pain points throughout the entire workflow—from setting up notebooks to deploying to production—across many notebook environments. Our data scientists report essential notebook requirements, such as supporting data exploration and visualization. The results of our study inform and inspire the design of computational notebooks.
  3. Advent of Computing — podcast of computing history.
  4. Privacy-Preserving Record Linkagetoolbox for deterministic, probabilistic, and privacy-preserving record linkage techniques.

Four short links: 20 January 2020

AR Lenses, Faux Keyboard Noises, Tech Villainy, and Data Tests

By Nat Torkington
  1. AR Contact LensThe path ahead is not a short one; contact lenses are considered medical devices and therefore need US Food and Drug Administration (FDA) approval. But the Mojo Lens has been designated as an FDA Breakthrough Device, which will speed things up a little. And clinical studies have begun.
  2. BucklespringThis project emulates the sound of my old faithful IBM Model-M space saver bucklespring keyboard while typing on my notebook, mainly for the purpose of annoying the hell out of my coworkers.
  3. Orange Badge (Tim Bray) — At some point, it’s going to be a real problem being management in a sector that’s widely feared and distrusted. But we in the tech tribe haven’t really internalized much about this yet. This. Silicon Valley failed to die a hero, so has lived long enough to see itself become the villain.
  4. Great ExpectationsPipeline tests are applied to data (instead of code) and at batch time (instead of compile or deploy time). Pipeline tests are like unit tests for datasets: they help you guard against upstream data changes and monitor data quality. Software developers have long known that automated testing is essential for managing complex codebases. Great Expectations brings the same discipline, confidence, and acceleration to data science and engineering teams.

Four short links: 17 January 2020

Cursed Filesystem, Many Cats, Speech Processing, and Standard Operating Procedure

By Nat Torkington
  1. cursedfsMake a disk image formatted with both ext2 and FAT at once. Silliness!
  2. catsHere, placed side-by-side for comparison, are GNU’s implementation of cat, Plan 9’s implementation, Busybox’s implementation, and NetBSD’s implementation, Seventh Edition Unix (1979), Tenth Edition Unix (1989), and 4.3BSD. There’s a lot to learn from the differences!
  3. wav2letter++a fast, open source speech processing toolkit the speech team at Facebook AI Research built to facilitate research in end-to-end models for speech recognition. It is written entirely in C++ and uses the ArrayFire tensor library and the flashlight machine learning library for maximum efficiency.
  4. Work is Work (Coda Hale) — Neither your employee handbook nor your calendar are accurate depictions of how work in the organization is done. Unless your organization is staffed with zombies, members of the organization will constantly be subverting standard operating procedure in order to get actual work done. Even ants improvise. (via Ben Gracewood)

Four short links: 16 January 2020

Zero Trust, Safeguarding Elections, Design Heuristics, and Image/Container Analysis

By Nat Torkington
  1. Zero Trust Architecture PrinciplesTen principles to help you design and deploy a zero trust architecture. They are: know your architecture; create a single strong user identity; create a strong device identity; authenticate everywhere; know the health of your devices and services; focus your monitoring on devices and services; set policies according to the value of services or data; control access to your services and data; don’t trust the network, including the local network; choose services designed for zero trust.
  2. Ten Things Technology Platforms Can Do To Safeguard The 2020 US Election — (and everyone else’s elections, you bumptious yokels). They’re all good suggestions. Google, Twitter, and Facebook do not share common language or definitions for political ads—the primary social media companies should agree on a common, broad set of definitions for political ads and adopt them across platforms. Seems like “the limits of free speech online” is an issue without a widely agreed success condition, making it unsuited to the competing-and-changing nature of free enterprise, which thrives better in “sell more widgets / make more money” types of clear-cut goals. If there’ll never be a market-led solution, citizens should direct suggestions like this post to their government rather than to the companies themselves.
  3. Evidence-based Design Heuristics for Idea GenerationObservations go beyond products to consider multiple concepts generated for a given problem.
  4. Terrieran image and container analysis tool that can be used to scan images and containers to identify and verify the presence of specific files according to their hashes.

Four short links: 15 January 2020

Sleep Deprivation, Lip Reading, Data Processing, and Rapid Product Development

By Nat Torkington
  1. Performance Degradation and Restoration During Sleep Deprivation (NCBI) — These results suggest that the brain adapts to chronic sleep restriction. In mild to moderate sleep restriction, this adaptation is sufficient to stabilize performance, although at a reduced level. These adaptive changes are hypothesized to restrict brain operational capacity and to persist for several days after normal sleep duration is restored, delaying recovery. Crunches are dangerous. (via Popular Science)
  2. Video-Driven Speech Reconstruction — aka: teaching a neural net to read lips.
  3. pxia small, fast, and magical command-line data processor similar to jq, mlr, and awk.
  4. iPod Timeline (Patrick Collison) — wow, that’s a hell of a timeline.

Four short links: 14 January 2020

Privacy Legislation, Bystander Effect, Computing Education, and Tech Adversaries

By Nat Torkington
  1. The 2019 Privacy Legislation Bomb Cyclone — I know from experience in edtech that the morass of states’ legislation doesn’t make life easy for startups. Brace for more, affecting everyone. We may have passed the days when you could do something online without needing an expert opinion from a lawyer.
  2. Cross-National CCTV Footage Shows That Intervention is the Norm in Public Conflicts (PDF)It is important therefore to recognize a key distinction between the likelihood of individual intervention and the aggregate that at least someone provides help. Yet, in comparison to the vast number of studies that examine intervention from the perspective of the individual bystander, we know surprisingly little about the situational intervention likelihood—that is, the probability that at least one bystander at the emergency event intervenes. […] Using a unique cross-national video data set from the United Kingdom, the Netherlands, and South Africa (N = 219), we show that in nine of 10 public conflicts, at least one bystander, but typically several, will do something to help. Although the Bystander Effect means an individual may feel less likely to help, there’s a 90% chance that *someone* will help. Not guaranteed to apply in company meetings, however.
  3. Computing Education: What I Got Wrong — really interesting lessons from the trenches of changing computing education. We are much more likely to integrate CS into mathematics or science teacher programs than to have standalone CS teacher professional development—and even that will require an enormous effort. […] Even if you have classes, you might not get students taking them, or it may just be more of the same kinds of students […] Diverse participation is really hard. I still believe in the value of having students program for learning lots of different things, but I’m no longer convinced that the “hard fun” of Logo is the most useful or productive path for using the power of computing for learning. I am less interested in making things for just a few precocious students, especially if teachers hate it. I believe in making things with teachers.[…] We can try to teach everyone about computational thinking, but that won’t get as far as improving the computing to help everyone’s thinking. Fix the environment, not the people.
  4. Tech Adversaries vs. Enemies (Alex Stamos) — excellent graduation speech. It is seductive to go along with the expectations of your boss, your colleagues, your shareholders, which you must resist. It can also be seductive to put yourself on a path where you might never be faced with hard decisions that you might regret or where you are free to always criticize without taking any ethical risks on your own.

Four short links: 13 January 2020

Simulated Customer, Symbolic Meets Statistical, Deep Fakes, and Online Radicalization

By Nat Torkington
  1. Simulated CustomerThe site will randomly generate one of 40 different [sales] objections, and give you 20 seconds to answer it.
  2. From Shallow to Deep Interactions Between Knowledge Representation, Reasoning, and Machine LearningThis paper proposes a tentative and original survey of meeting points between knowledge representation and reasoning (KRR) and machine learning (ML), two areas which have been developing quite separately in the last three decades. […] This paper is the first step of a work in progress aiming at a better mutual understanding of research in KRR and ML, and how they could cooperate.
  3. NHK Raises the Dead to Mixed ReviewsEnka singer Hibari Misora graced the “Kohaku” stage for the first time in decades to perform a new song. Well, technically, it wasn’t Misora herself—she died in 1989. Rather, it was a life-like hologram performing this fresh tune thanks to Yamaha’s Vocaloid: AI, a piece of technology that can replicate voices. Deepfaked audio and imagery. (via Hacker News)
  4. Empirical Studies of Online Radicalization: A Review and DiscussionOnly 18 studies that met Desmarais et al.’s (2017) stringent systematic review criteria empirically examined the radicalization process. Fewer still, presumably, examined the online radicalization process. Indeed, Hassan et al., (2018) conducted a systematic review specifically focused on the relationship between the impact of extremist online content and violent radicalization. Eleven studies fit their eligibility criteria. […] The emerging evidence base is also pretty clear. Those who are radicalized and/or commit acts of terrorism have generally been exposed to radicalizing content. Exposure to this content leads to affective, emotional, and behavioral change at each stage of the process. Of course, some of these studies have relatively small sample sizes, and are only focused on specific types of terrorists or geographical contexts. The key now is to replicate and build upon this preliminary evidence to give us a sense of not just whether exposure to ideological content in the online environment causes violent extremism, but also how, in what contexts and for whom? Is “exposure” sufficient whether it is in the virtual or physical world? Does it work differently for different people in different contexts?

Four short links: 10 January 2020

Automation UX, Awful AI, Neural Net Guitar Pedal, and Closed Web

By Nat Torkington
  1. Ten Challenges for Making Automation a “Team Player” in Joint Human-Agent Activity — it’s really interesting to read this and think how they might manifest in, eg., a chatbot. I remember Jesse Robbins talking about Orion for emergency workers and how they were having to invent this stuff. It’s remarkable how we’ve gone through a chatbot boom and bust cycle without much forward progress in standardizing these things. (via The Morning Paper)
  2. Awful AIa curated list to track current scary usages of AI—hoping to raise awareness to its misuses in society.
  3. A Neural Network Guitar PedalThis neural network is trained to turn a guitar into a piano in real time. It’s not perfect but it’s still pretty amazing. (via Twitter)
  4. Web is Now ClosedSamuel Maddock has been trying to create a rival “indie” browser, and has been to each of the EME DRM vendors and has been sent away by all of them. This is appalling.

Four short links: 9 January 2020

Structuring Papers, State of the World 2020, Reading Big Difficult Books, and Storing Forever

By Nat Torkington
  1. Ten Simple Rules for Structuring PapersFocus your paper on one central contribution, which you communicate in the title; write for flesh-and-blood human beings who do not know your work; stick to the context-content-conclusion (C-C-C) scheme; optimize your logical flow by avoiding zig-zag and using parallelism; tell a complete story in the abstract; get across why the paper matters in the introduction; communicate the results as a sequence of statements, supported by figures, that connect logically to support the central contribution; discuss how the gap was filled, the limitations of the interpretation, and the relevance to the field; allocate time where it matters: title, abstract, figures, and outlining; get feedback to reduce, reuse, and recycle the story.
  2. State of the World 2020 — Bruce Sterling and Jon Lebkowsky at The WELL. So in MMXX, we’re in a world situation that claims to be post-global and post-internet, and post world-trade, where everybody wants to take back control, be great again, assure sovereign cyberspace, set tariffs, jail immigrant tots, beat up ethnic minorities, nurture billionaires, ignore science, and reduce education to assure that there are fewer brainy chicks—but in practice, there’s no big difference among the players. They ALL do that. There’s next to no genuine cultural variety. They all use the same hardware, slogans, and techniques.
  3. A Note on Reading Big, Difficult Books (Brad DeLong) — We have our recommended ten-stage process for reading such big books: 1. Figure out beforehand what the author is trying to accomplish in the book. 2. Orient yourself by becoming the kind of reader the book is directed at—the kind of person with whom the arguments would resonate. 3. Read through the book actively, taking notes. 4. “Steelman” the argument, reworking it so that you find it as convincing and clear as you can possibly make it. 5. Find someone else—usually a roommate—and bore them to death by making them listen to you set out your “steelmanned” version of the argument. 6. Go back over the book again, giving it a sympathetic but not credulous reading. 7. Then you will be in a good position to figure out what the weak points of this strongest-possible argument version might be. 8. Test the major assertions and interpretations against reality: do they actually make sense of and in the context of the world as it truly is? 9. Decide what you think of the whole. 10. Then comes the task of cementing your interpretation, your reading, into your mind so that it becomes part of your intellectual panoply for the future.
  4. Perkeep — Camlistore gets a new name. A set of open source formats, protocols, and software for modeling, storing, searching, sharing, and synchronizing data in the post-PC era. Data may be files or objects, tweets or 5TB videos, and you can access it via a phone, browser, or FUSE filesystem.

Four short links: 8 January 2020

Running Unconferences, Media Server, Lyfte's Workflow Tool, and Bandwidth Utilization

By Nat Torkington
  1. Ten Simple Rules for Organizing an Unconference — academia-targeted, but generally useful, advice for running unconferences.
  2. Jellyfinfree software media server.
  3. Flytea structured programming and distributed processing platform for highly concurrent, scalable, and maintainable workflows from Lyft. Intro blog post lays out the case, and this blog post describes the differences between Flyte and Apache Airtable.
  4. bandwhichterminal-based bandwidth utilization tool.

Four short links: 7 January 2020

Coding Interview Problems, Coder Stratification, Writing a Compiler, and Distributed Execution Framework

By Nat Torkington
  1. Coding Interview Problems Solved in Go — see also some in Rust, and the best coding interview take ever, by Aphyr. Because thinly veiled excuses to use dynamic programming or graph coloring are the “Hello world” of our Google-aspirational age. (via Hacker News)
  2. Coding Will Divide Along Class Lines (Mike Loukides) — The programming world will increasingly be split between highly trained professionals and people who don’t have a deep background but have a lot of experience building things. The former group builds tools, frameworks, languages, and platforms; the latter group connects things and builds websites, mobile apps, and the like. This divide will mean different tools and training for each.
  3. A Compiler Writing JourneyIn this GitHub repository, I’m documenting my journey to write a self-compiling compiler for a subset of the C language. I’m also writing out the details so that, if you want to follow along, there will be an explanation of what I did, why, and with some references back to the theory of compilers.
  4. Raya distributed execution framework that makes it easy to scale your applications and to leverage state-of-the-art machine learning libraries. See this introductory post for the rationale.

Four short links: 6 January 2020

OS Forks, WASM OS, Mediating Consent, and Computational Cinematography

By Nat Torkington
  1. An Excess of Operating Systems (Jean-Louis Gassée) — Fuschia exists for technical reasons, but Samsung’s, Amazon’s, Huawei’s, etc., are all for business reasons (not wanting to tithe or be tied strategically to Google).
  2. RedshirtThe redshirt operating system is an experiment to build some kind of operating-system-like environment where executables are all in WASM and are loaded from an IPFS-like decentralized network. […] There exists three core syscalls (send a message, send an answer, wait for a message), and everything else is done by passing messages between processes or between a process and the “kernel.” Programs don’t know who they are sending the message to. One person’s dream is another’s nightmare.
  3. Mediating Consent (Renee DiResta) — essay on manufacturing consent in the social media age. The path forward requires systems to facilitate mediating, not manufacturing, consent. We need a hybrid form of consensus that is resistant to the institutional corruption of top-down control, and welcomes pluralism, but is also hardened against bottom-up gaming of social infrastructure by malign actors.
  4. Synopsisa suite of open source software for computational cinematography—tools that help the creation of visual media. Synopsis is built to help editors, artists, indie film makers, A/V developers, and creators do what they do best—tell stories, make experiences, and build amazing tools.

Four short links: 3 January 2020

Portable Scripts, Training Actors, Cyber Law, and Government Data

By Nat Torkington
  1. Chesterton’s Shell Script (Pete Warden) — those who forget Perl’s Configure.sh are doomed to recreate it. “Congratulations, you’re not running Eunice!”
  2. Light (Facebook AI) — a large-scale fantasy text adventure game research platform for training agents that can both talk and act, interacting either with other models or with humans. (Via introducing blog post.)
  3. International Cyber Law in Practice: Interactive ToolkitAt its heart, it consists of 13 hypothetical scenarios, to which more will be added in the future. Each scenario contains a description of cyber incidents inspired by real-world examples, accompanied by detailed legal analysis. The aim of the analysis is to examine the applicability of international law to the scenarios and the issues they raise.
  4. UX of Bushfire Maps (Ellen Broad) — classic government map/data problem: each state/agency has its own map, showing its own view of the world, and they don’t even use the same symbols. Which makes life miserable for people who don’t care about the org chart, they just want to learn something their government knows — like whether their house will burn today. (Via Merrin Macleod.)

Four short links: 2 January 2020

Voice Assistant, Public Domain, Bing Disinformation, and Knowledge Bases

By Nat Torkington
  1. Rhasspyan open source, fully offline voice assistant toolkit for many languages that works well with Home Assistant, Hass.io, and Node-RED.
  2. Public Domain Day 2020 — Forster’s “A Passage to India,” Gershwin’s “Rhapsody in Blue,” and the first film adaptation of Peter Pan are amongst the works entering the public domain in the US.
  3. Bing’s Top Search Results Contain an Alarming Amount of DisinformationIn general, Bing returns disinformation and misinformation at a significantly higher rate than Google does. In general, Bing directs users to conspiracy-related content, even if they aren’t explicitly looking for it. Bing shows users Russian propaganda at a much higher rate than Google does. Bing places student-essay sites—sites where students post or sell past papers — in its top 50 results for certain queries. Bing dredges up gratuitous white-supremacist content in response to unrelated queries.
  4. Outlinewiki and knowledge base for growing teams. Beautiful, feature rich, markdown compatible, and open source.

Four short links: 1 January 2020

Think Like a Programmer, Do Good Deeds, Command-line Trello-like Tool, and Advice for a New Executive

By Nat Torkington
  1. Seven Ways to Think Like a Programmer1. It’s all just data. 2. Data doesn’t mean anything on its own—it has to be interpreted. 3. Programming is about creating and composing abstractions. 4. Models are for computers, and views are for people. 5. Paranoia makes us productive. 6. Better algorithms are better than better hardware. 7. The tool shapes the hand.
  2. Using FOIA Data and Unix to Halve Major Source of Parking Tickets — a reminder of one of the best things to happen in the 2010s: automating good deeds.
  3. TaskbookTasks, boards, and notes for the command-line habitat.
  4. Advice for a New Executive (Lara Hogan) — Chad’s advice to Lara when she joined Kickstarter. 1. Find/create a peer support group. 2. Partner absurdly closely with product and make sure you understand priorities and the head of product understands tradeoffs. 3. Focus on delivery of the roadmap and everything else will follow. 4. Ask your executive peers regularly what you can do to make their jobs easier—particularly the CEO. 5. Take a stand when you need to. 6. Always have a story. 7. Read widely—offline!—about management and leadership. 8. Realize the impact your mood and demeanor has on people. 9. Develop the right relationship with members of your company’s board. From August, but it holds up very well.

Four short links: 31 December 2019

Learn Assembly, Quantum Puzzles, Ghost Characters, and Computer Networks

By Nat Torkington
  1. MicrocorruptionYou’ve been given access to a device that controls a lock. Your job: defeat the lock by exploiting bugs in the device’s code. Fun way to learn assembly language and debugging.
  2. Meqanicquantum computer puzzle game.
  3. Unicode’s Ghost Charactersafter the JIS standard was released, people noticed something strange—several of the added characters had no obvious sources, and nobody could tell what they meant or how they should be pronounced. Nobody was sure where they came from. These are what came to be known as the ghost characters.
  4. Computer Networks: A Systems Approach (GitHub) — textbook released under CC.

Four short links: 30 December 2019

Dynamic Graphs, Gamification, hipsterDB, and JavaScript Testing

By Nat Torkington
  1. GraphStreama Java library for the modeling and analysis of dynamic graphs. You can generate, import, export, measure, layout, and visualize them.
  2. Governing by Video Game“Real participation—and this is important to clarify—is not a game. It takes time. It takes energy. That’s why not many people participate,” Sugeo says. On the other hand, making it clear that an activity is supposed to be a bit of fun, à la CitySwipe, immediately downgrades the seriousness with which participants engage. “So you’re probably attracting more people to the simplified version and still not solving the problem of engagement.”
  3. hipsterDBhipsterDB is a key/value store that only returns data as long as it isn’t mainstream. The more often that you access a key, the more mainstream it becomes. After data has gone mainstream, you will have to wait for it to go out of style before using it again. Satire, duh.
  4. JavaScript and Node.js Testing Best Practices — covering test anatomy, back end, front end, measuring test effectiveness, and continuous integration.