Four Short Links

Nat Torkington’s eclectic collection of curated links.

Four Short Links

Four short links: 11 October 2019

Resilience Engineering, Ancient Emulators, Long Timespan Design, and Exporting Section 230

By Nat Torkington
  1. Resilience Engineering PapersThis doc contains notes about people active in resilience engineering, as well as some influential researchers who are no longer with us, organized alphabetically. It also includes people and papers from related fields, such as cognitive systems engineering and naturalistic decision-making.
  2. Sweet 16a metaprocessor or “pseudo microprocessor” implemented in 6502 assembly language. Originally written by Steve Wozniak and used in the Apple II, Sweet 16 can also be ported to other 6502-based systems to provide useful 16-bit functionality. This article includes the source code for Sweet 16, along with a brief history, programming instructions, and notes to help port it. I was amazed at how soon emulators appear in the history of computing—eg., John Backus’s Speedcode from 1953.
  3. Thinking, Storytelling, and Designing with Long Timespans — syllabus for class taught by Stuart Candy at the Long Now Foundation. (via Twitter)
  4. Section 230 Going Into Trade Deals (NYT) — The protections, which stem from a 1990s law, have already been tucked into the administration’s two biggest trade deals—the United States-Mexico-Canada Agreement and a pact with Japan that President Trump signed on Monday. American negotiators have proposed including the language in other prospective deals, including with the European Union, Britain, and members of the World Trade Organization. […] The American rules, codified in Section 230 of the Communications Decency Act, shield online platforms from many lawsuits related to user content and protect them from legal challenges stemming from how they moderate content. Those rules are largely credited with fueling Silicon Valley’s rapid growth. The language in the trade deals echoes those provisions but contains some differences.

Four short links: 10 October 2019

Unix Passwords, Remote Foo, Text Graphics, and AI in AppInventor

By Nat Torkington
  1. Ken Thompson’s Unix PasswordSomewhere around 2014 I found an /etc/passwd file in some dumps of the BSD 3 source tree, containing passwords of all the old timers such as Dennis Ritchie, Ken Thompson, Brian W. Kernighan, Steve Bourne, and Bill Joy. Those passwords are very amenable to modern cracking methods, but Thompson’s was the last to be cracked…
  2. How to Run a Remote-First Open-Space Un-Conference — neat!
  3. Libcacaa graphics library that outputs text instead of pixels so that it can work on older video cards or text terminals.
  4. MIT’s AppInventor Now Does AIAI with MIT App Inventor includes tutorial lessons as well as suggestions for student explorations and project work. Each unit also includes supplementary teaching materials: lesson plans, slides, unit outlines, assessments and alignment to the Computer Science Teachers of America (CSTA) K12 Computing Standards.

Four short links: 9 October 2019

Data Playbook, Global Politics Meets Tech, ML Models, and Lock-free Programming

By Nat Torkington
  1. IFRC Data Playbook ToolkitThe Data Playbook Beta is a recipe book or exercise book with examples, best practices, how-to’s, session plans, training materials, matrices, scenarios, and resources. The data playbook will provide resources for National Societies to develop their literacy around data, including responsible data use and data protection. The content aims to be visual, remixable, collaborative, useful, and informative.
  2. The China Cultural Clash — as more companies have a financial interest in China (either partially owned by, or hoping to sell hard into), employees and users are being discouraged from sharing opinions that China disagrees with.
  3. 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Adrian Colyer) — Oddly enough given the paper title, the six lessons are never explicitly listed or enumerated in the body of the paper, but they can be inferred from the division into sections. My interpretation of them is as follows: (1) Projects introducing machine learned models deliver strong business value; (2) Model performance is not the same as business performance; (3) Be clear about the problem you’re trying to solve; (4) Prediction serving latency matters; (5) Get early feedback on model quality; (6) Test the business impact of your models using randomized controlled trials (follows from #2).
  4. Awesome Lock-FreeA collection of resources on wait-free and lock-free programming.

Four short links: 8 October 2019

Visual AI Tools, Software Design, Image Processing Chains, and Music Player Firmware

By Nat Torkington
  1. YellowbrickOpen source visual analysis and diagnostic tools to facilitate machine learning model selection.
  2. Eight Habits of Expert Software Designers: An Illustrated GuideExperts imagine how a design will work—simulating aspects of the envisioned software and how the different parts of the design support a variety of scenarios. When working with others, experts regularly walk through a design by verbalizing its operation step by step. When alone, they simulate mentally, exercising the design repeatedly over time. This fits with my sense of good programmers as good simulators.
  3. ImagePlayA rapid prototyping tool for building and testing image processing algorithms. It comes with a variety of over 70 individual image processors that can be combined into complex process chains. ImagePlay is completely open source and can be built for Windows, Mac, and Linux.
  4. RockboxA free replacement firmware for digital music players. It runs on a wide range of players.

Four short links: 7 October 2019

Screen Addiction, Data Viz, Algorithmic Bias, and Tools for Thought

By Nat Torkington
  1. Addicted to Screens? That’s Really a You Problem (NY Times) — In “Indistractable,” which was published last month, Mr. Eyal has written a guide to free people from an addiction he argues they never had in the first place. It was all just sloughing off personal responsibility, he figures. So the solution is to reclaim responsibility in myriad small ways. For instance: have your phone on silent so there will be fewer external triggers. Email less and faster. Don’t hang out on Slack. Have only one laptop out during meetings. Introduce social pressure like sitting next to someone who can see your screen. Set “price pacts” with people so you pay them if you get distracted—though be sure to “learn self-compassion before making a price pact.”
  2. The Perceptual and Cognitive Limits of Multivariate Data VisualizationAlmost all data visualizations are multivariate (i.e., they display more than one variable), but there are practical limits to the number of variables that a single graph can display. These limits vary depending on the approach that’s used. Three graphical approaches are currently available for displaying multiple variables: (1) encode each variable using a different visual attribute; (2) encode every variable using the same visual attribute; (3) increase the number of variables using small multiples. In this article, we’ll consider each.
  3. Any Sufficiently Advanced Neglect is Indistinguishable from Malice: Assumptions and Bias in Algorithmic SystemsA harm created through persistent ignorance, through willful ignorance of harm raised, is not necessarily very different from harm intentionally done.
  4. How Can We Develop Transformative Tools for Thought?We believe now is a good time to work hard on this vision again. In this essay, we sketch out a set of ideas we believe can be used to help develop transformative new tools for thought. In the first part of the essay, we describe an experimental prototype system we’ve built, a kind of mnemonic medium intended to augment human memory. This is a snapshot of an ongoing project, detailing both encouraging progress as well as many challenges and opportunities. In the second part of the essay, we broaden the focus. We sketch several other prototype systems, and we address the question: why is it that the technology industry has made comparatively little effort developing this vision of transformative tools for thought?

Four short links: 4 October 2019

Understanding SQL, Pricing, App Configuration, and Internet Jurisdiction

By Nat Torkington
  1. SQL Queries Don’t Start with SELECT (Julia Evans) — today I learned…
  2. Vickery Auctions to Discover Demand Curve — this is gold!
  3. HydraA framework for elegantly configuring complex applications. Python library that makes config, command-line flags, logging, etc.
  4. EU Wants Global Control of Facebook Content (Verge) — On Thursday, the European Union’s top court ruled that lower court judges could order Facebook to remove illegal comments from its platform, expanding on the power individual countries have to extend content bans across the world. See this thread for more commentary. Aside from substance issues, there is a major process issue: this ruling affects billions of Facebook users, but they were not represented in court. Important legal arguments about their rights were simply not raised or considered.

Four short links: 3 October 2019

Content Moderation, Go ORM, Adversarial Interoperability, and Random Sample Elections

By Nat Torkington
  1. Why Do Companies With Huge Resources Still Have Terrible Moderation? — an extremely readable explanation of why it’s so damn hard. Hint: AI isn’t it.
  2. entAn entity framework for Go. Simple, yet powerful ORM for modeling and querying data. Open source, from Facebook.
  3. Adversarial Interoperability (Cory Doctorow) — collection of articles on when you create a new product or service that plugs into the existing ones without the permission of the companies that make them.
  4. Random Sample Elections (David Chaum) — The number of voters sampled can be small, depending on how close the contest, yet give overwhelming confidence. For instance, if the margin is at least 10%, then a thousand votes will likely yield a result that itself, without any assumption about the margin and with only a one-in-a million chance of error, establishes that a majority are in favor—even with an electorate of millions or billions. This dramatic reduction in the number of voters participating in each election compared to a conventional election today yields a substantially proportionate reduction in cost.

Four short links: 2 October 2019

Data Fallacies, Transparency Reports, Encryption, and Experimental Declarative Programming Language

By Nat Torkington
  1. Data Fallacies to Avoid — nifty infographic for the beginning torturer of data.
  2. Transparency Reports Suffering“The momentum has faded,” says Peter Micek, general counsel with Access Now. The digital rights advocacy group is updating its index of transparency reports, which it last posted in 2016, and this pending revision will document serious stagnation in these disclosures. The worst rollbacks have happened when companies have merged or sold off large parts of their customer base, leaving the people involved doing business with new management that lacks the old management’s commitment to transparency.
  3. How Long Will Unbreakable Commercial Encryption Last? (Lawfare) — I believe the tech companies are slowly losing the battle over encryption. They’ve been able to bottle up legislation in the United States, where the tech lobby represents a domestic industry producing millions of jobs and trillions in personal wealth. But they have not been strong enough to stop the Justice Department from campaigning for lawful access. And now the department is unabashedly encouraging other countries to keep circling the tech industry, biting off more and more in the form of law enforcement mandates. That’s a lot easier in countries where Silicon Valley is seen as an alien and often hostile force, casually destroying domestic industries and mores.
  4. Sentient — an interesting experimental language to describe problems (Prolog-like), with SAT solvers under the hood to find solutions.

Four short links: 1 October 2019

Research, Observability, Self-Enumeration, and Probabilistic Programming

By Nat Torkington
  1. Just Enough Research — a book that comes recommended by Simon Willison.
  2. Observations on ObservabilityI think the future of operating software systems at scale will look like process engineering. We will rely on continuous signals and look at software systems as dynamical systems. We will embrace similar techniques for process control and continuous improvement. This is why I do not like the term observability as it is currently used. In operating software systems at scale, we may want to reserve the traditional definition of observability as it relates to controllability.
  3. History of Self-Enumerating Pangram Tweet — a history of the work that went into making sentences (and now tweets) that enumerate their contents accurately.
  4. Anatomy of a Probabilistic Programming FrameworkI realized that despite knowing a thing or two about Bayesian modeling, I don’t understand how probabilistic programming frameworks are structured, and therefore couldn’t appreciate the sophisticated design work going into PyMC4. So, I trawled through papers, documentation, and source code of various open source probabilistic programming frameworks, and this is what I’ve managed to take away from it.

Four short links: 30 September 2019

CLOUD Act, Ethical Consumption of Bits, TV Tracking, and Long Projects

By Nat Torkington
  1. Stamos on CLOUD Act — cogent and informative set of tweets (words I never thought I would say) from Alex Stamos, with context for the latest piece of Internet regulation to get alarmist and wrong media coverage.
  2. Migrating from CloudflareThis is pretty cool, and it’s why I’ve used Cloudflare for a few years. However, I don’t really like Cloudflare. I don’t like how they protect hate forums, where mass shootings are planned; I don’t like how they have grown to the point where a huge portion of the internet’s total traffic flows through their infrastructure; I don’t like how un-seriously they treat their responsibilities. So, I wanted to move off. More datapoints for the emerging Ethical Consumption of Bits.
  3. Three Recent Papers on the Tracking in TVs (Arvind Narayanan) — Here’s a doozy: Roku has a “Limit Ad Tracking” option. Turning it on increased the number of tracking servers contacted 🙃 . It did prevent Roku’s AD ID from being leaked, but a whole bunch of other unique IDs are available. Even Pi-hole wasn’t that effective at limiting tracking. (via Hacker News)
  4. Strategies for Long Projects (Ben Brostoff) — Relentless, irrational optimism is the only attitude that works.

Four short links: 27 September 2019

Creative Coding, Collective Social Behavior, Programming Language Research, and Social Media Manipulation

By Nat Torkington
  1. Intro to Creative Coding — this is the repo, also check out p5.js demos and tone.js demos. (via @mattdesl)
  2. The Dynamics of Collective Social Behavior in a Crowd-Controlled GameWe find that having a fraction of players who do not follow the crowd’s average behavior is key to succeed in the game.
  3. Recent Programming Language Research (SIGPLAN) — These papers showcase PL [Programming Language] connections to areas as diverse as chemical microfluidics, blockchain smart contracts, and automated debugging.
  4. 2019 Global Inventory of Organized Social Media ManipulationEvidence of organized social media manipulation campaigns, which have taken place in 70 countries, up from 48 countries in 2018 and 28 countries in 2017. Social media has become co-opted by many authoritarian regimes. In 26 countries, computational propaganda is being used as a tool of information control in three distinct ways: to suppress fundamental human rights, discredit political opponents, and drown out dissenting opinions. Again, understand dystopic possible futures for your presently-democratic nation so you design your software to avoid them.

Four Short Links: 26 September 2019

Censorship, Tiktok, Machine Learning Workspace, and Deepfake Library

By Nat Torkington
  1. Tiktok and Ethnic Cleansing — as hardline governments rise all around the world, the whole “design policy and tools for the worst case environment” is looking a whole lot more salient. There are worse situations in the world than that of Xinjiang’s Uyghurs, but not many. Tiktok’s shadowbans and content blocks are worth learning more about. Also in content moderation news today: Facebook is not fact-checking political speech.
  2. How Tiktok Is Changing the World and You’re Missing ItImagine if you created a new account on a social network, you had zero followers, and you posted a piece of content, and then you went viral. That would be ridiculous right? Right, it would be ridiculous. But, that’s how TikTok works.
  3. ml-workspaceall-in-one web-based IDE specialized for machine learning and data science.
  4. Contributing Data to Deepfake Detection (Google) — To make this dataset, over the past year we worked with paid and consenting actors to record hundreds of videos. Using publicly available deepfake generation methods, we then created thousands of deepfakes from these videos. The resulting videos, real and fake, comprise our contribution, which we created to directly support deepfake detection efforts. As part of the FaceForensics benchmark, this dataset is now available, free to the research community, for use in developing synthetic video detection methods.

Four short links: 25 September 2019

Cleaning ImageNet, Thumbnails, Tracking Users, and AR Tabletop Gaming

By Nat Torkington
  1. Removing Slurs from ImageNetThe first issue is that WordNet contains offensive synsets that are inappropriate to use as image labels. Although during the construction of ImageNet in 2009 the research team removed any synset explicitly denoted as ‘offensive,’ ‘derogatory,’ ‘pejorative,’ or ‘slur’ in its gloss, this filtering was imperfect and still resulted in inclusion of a number of synsets that are offensive or contain offensive synonyms. […] We are in the process of preparing a new version of ImageNet by removing all the synsets identified as ‘unsafe’ and ‘sensitive’ along with their associated images. This will result in the removal of 600,040 images, leaving 577,244 images in the remaining “safe” person synsets. To see unsafe labelling in action, try ImageNet Roulette and compare pictures of men and women, people with different colored skin, etc.
  2. Thumboropen-source smart on-demand image cropping, resizing, and filters.
  3. SocialPatha Django application for gathering social media intelligence on specific username. It checks for Twitter, Instagram, Facebook, Reddit and Stack Overflow. Collected data is sorted according to words frequency, hashtags, timeline, mentions, similar accounts, and presented as charts with the help of D3js. This technique allows me to track darknet users who does not use unique nicknames.
  4. Tilt Five: Holographic Tabletop Gaming (Kickstarter) — an AR gaming project from the remarkable Jeri Ellsworth.

Four short links: 24 September 2019

ICE Companies, Matrix Methods, Malfunctioning Sounds, and Terrorist Definitions

By Nat Torkington
  1. Companies That Work With ICE — interesting to see technology platforms in the conversation about ethical business. I wonder who contracts to RJ Reynolds, or Shell, and whether we’ll see lists of those companies too.
  2. Matrix Methods in Data Analysis, Signal Processing, and Machine Learning — videos of lectures. (via Mat Kelcey)
  3. Sound Dataset for Malfunctioning Industrial Machine Investigation and InspectionIn this paper, we present a new dataset of industrial machine sounds that we call a sound dataset for malfunctioning industrial machine investigation and inspection (MIMII dataset). Normal sounds were recorded for different types of industrial machines (i.e., valves, pumps, fans, and slide rails), and to resemble a real-life scenario, various anomalous sounds were recorded (e.g., contamination, leakage, rotating unbalance, and rail damage). The purpose of releasing the MIMII dataset is to assist the machine-learning and signal-processing community with their development of automated facility maintenance. (via the arXiv paper)
  4. Terrorist Definitions and Designations Lists: What Technology Companies Need to Know — short answer: there’s no existing answer that you can just defer to.

Four short links: 23 September 2019

Good at Operations, Neural Speech Recognition, Cryptocurrency Abuses, and Compiler Tools for ML

By Nat Torkington
  1. The Cloud and Open Source (Tim Bray) — (language warning) There is plenty of evidence that you can be a white-hot flaming a**wipe and still ship great software. But (going out on a limb) I don’t think you can be an a**hole and be good at operations.
  2. Espresso: A Fast End-to-end Neural Speech Recognition Toolkitopen source fast end-to-end neural speech recognition toolkit.
  3. An Investigative Study of Cryptocurrency Abuses in the Dark Web In total, using MFScope we discovered that more than 80% of Bitcoin addresses on the Dark Web were used with malicious intent; their monetary volume was around 180 million USD, and they sent a large sum of their money to several popular cryptocurrency services (e.g., exchange services). Furthermore, we present two real-world unlawful services and demonstrate their Bitcoin transaction traces, which helps in understanding their marketing strategy as well as black money operations.
  4. MLIRan intermediate representation (IR) system between a language (like C) or library (like TensorFlow) and the compiler backend (like LLVM). It allows code reuse between different compiler stacks of different languages and other performance and usability benefits. MLIR is being developed by Google as an open-source project primarily to improve the support of TensorFlow on different backends but can be used for any language in general.

Four short links: 20 September 2019

Free Faces, Cultural Distance, Connected Pacific, and Number Munger

By Nat Torkington
  1. 100,000 Free Royalty-Free Faces — “generated by AI.”
  2. Beyond WEIRD Psychology: Measuring and Mapping Scales of Cultural and Psychological Distancesince psychological data is dominated by samples drawn from the United States or other WEIRD nations, this tool provides a “WEIRD scale” to assist researchers in systematically extending the existing database of psychological phenomena to more diverse and globally representative samples. As the extreme WEIRDness of the literature begins to dissolve, the tool will become more useful for designing, planning, and justifying a wide range of comparative psychological projects. (via Marginal Revolution)
  3. Connected PacificThis site reviews the telecommunications environment of the Pacific Islands. It looks at each community’s connectivity to the world: telecommunications, sea freight, air routes, and trade. It provides real-time statistics on provider market share. It considers the complexity of island telecommunications through the mythical nation of Avaiki. Over time, it will be expanded to include data on carrier interconnections and performance to each market’s major trading partners. I love integrated views of data that give you context like this.
  4. SoulverQuicker to use than a spreadsheet, and smarter and clearer than a traditional calculator.

Four short links: 19 September 2019

Boring Technology, YouTube's Procella, Hong Kong Protest Tech, and Personas

By Nat Torkington
  1. Boring Technology Behind a One-Person StartupJust dead simple things that actually work.
  2. YouTube’s Procella Databasea new query processing engine built on top of various primitives proprietary to Google. This paper stood out to me for a number of reasons: this is the SQL query engine that runs YouTube, and it has been for a number of years now, and it contains optimizations that run queries faster than BigQuery, by two orders of magnitude in some cases. Every day this software serves 100s of billions of queries that span 10 PB of data. In this blog post, I’ll pick apart the paper and compare its descriptions to my understanding of various Hadoop-related technologies.
  3. Observations on Technology Use in Hong Kong Protests (Maciej Ceglowski) — Telegram is the preferred messenger app among protesters. It’s used for one-on-one messaging between people, among small groups of people to coordinate, and among very large groups to amplify and disseminate information. The polls feature in Telegram is also a way of affirming consensus in group decision-making.
  4. Misusing Personas with the Seven Dwarfs“Dwarf personas” focus on users’ mental states and should help us understand how they might be personally impacted by the product. They appreciate that users are complex and can’t always be represented by a single persona.

Four short links: 18 September 2019

Topology, VC Buds, ImageNet Roulette, and Social Science One

By Nat Torkington
  1. Extracting Insights from the Shape of Complex Data using Topology (Nature) — three properties of topology that make it useful: coordinate-free, so you can work on data with different coordinate systems; “small” deformation invariance, so it’s resistant to some forms of noise; and it works on the compressed representations of shapes so you can use it to throw away features but still keep fundamental properties of the shape of the data. I’m paraphrasing the intro here.
  2. Getting Tired of Your Friends: The Dynamics of Venture Capital RelationshipsBased on the relationships of the top 50 US venture capital firms, this paper focuses on the strengths of relationships and their dynamic evolution. Empirical estimates indicate that having a deeper relationship leads to fewer, not more future co-investments. Moreover, deeper relationships lead to lower exit performance, even after controlling for endogeneity. Interestingly, deeper relationships first lead to lower performance, and subsequently lead to a slowdown in the relationship intensity. The more I know about you, the less I want to do business with you because I know we’ll lose money together.
  3. ImageNet Roulette (Kate Crawford and Trevor Paglen) — show it a picture, it’ll label using ImageNet. A wonderful reality check about the limitations of general-purpose computer perception.
  4. Social Science One Ships — Facebook’s project to release data to registered researchers took longer than expected because they implemented differential privacy (perturbing the data to preserve statistical properties while preventing reconstruction of any one person’s attributes). The data describe web page addresses (URLs) that have been shared on Facebook starting January 1, 2017, up to and including February 19, 2019. URLs are included if shared with public privacy settings more than on average 100 times (±Laplace(5) noise to minimize information leakage). We have conducted post-processing on the URLs … to remove potentially private and/or sensitive data This paper goes into detail on steps taken to protect privacy.