Four Short Links

Nat Torkington's eclectic collection of curated links.

Four short links: 18 April 2019

Geospatial Feature Engineering, 3D Reconstruction, Fast NLP, and Learning the Zork Interpreter Language

  1. Geomancer -- a geospatial feature engineering library. It leverages geospatial data such as OpenStreetMap (OSM) alongside a data warehouse like BigQuery. You can use this to create, share, and iterate geospatial features for your downstream tasks (analysis, modeling, visualization, etc.).
  2. Meshroom -- a free, open source 3D Reconstruction Software based on the AliceVision framework.
  3. BlingFire -- A lightning fast finite state machine and regular expression manipulation library. [...] We use Fire for many linguistic operations inside Bing such as tokenization, multi-word expression matching, unknown word-guessing, stemming / lemmatization, just to mention a few. cf NLTK.
  4. Learning ZIL -- what the Infocom games were written in, decades before Inform. Andrew Plotkin wrote an intro that explains how it sits in the universe. (Note: this is useless but historically interesting.)

Four short links: 17 April 2019

Infocom Source, Twitter Design, New Ways of Seeing, and Software Blowouts

  1. Infocom Source Code Uploaded -- with some version control (retroactively manufactured from different versions of the source code). Uploaded from a hard drive of Infocom material copied at the time of the acquisition. Jason Scott described the contents. See also DECWAR source.
  2. I Kind of Hate Twitter (Jason Lefkowitz) -- a very good product analysis of why Twitter drives unproductive behaviour. Example: Push delivery makes it hard to ignore what people are saying about you. If someone’s talking about you on the web, you have to go into Google and search to find that out. If someone’s talking about you on Twitter, though, it’s very likely right in your face. This can be flattering if people are saying nice things, but if they’re not, it can feel embarrassing and/or painful; and people who are embarrassed or wounded tend to do stupid things, like lash back at the person who did the wounding, that they regret later when the pain has worn off.
  3. New Ways of Seeing -- new BBC show from James Bridle which looks to be great. (via The Guardian)
  4. Why Software Projects Take Longer Than You Think—a Statistical Model -- A reasonable model for the “blowup factor” would be something like a log-normal distribution. If the estimate is one week, then let’s model the real outcome as a random variable distributed according to the log-normal distribution around one week. This has the property that the median of the distribution is exactly one week, but the mean is much larger [...]

Four short links: 16 April 2019

Data Brokers, AI Research Ethics, Overclaimed Science, and Hardware for ML

  1. Facebook Transparency Tool (Buzzfeed) -- A transparency tool on Facebook inadvertently provides a window into the confusing maze of companies you’ve never heard of who appear to have your data.
  2. Microsoft’s AI Research with Chinese Military University Fuels Concerns (SCMP) -- “The new methods and technologies described in their joint papers could very well be contributing to China’s crackdown on minorities in Xinjiang, for which they are using facial recognition technology,” said Helena Legarda, a research associate at the Mercator Institute for China Studies, who focuses on China’s foreign and security policies.
  3. @justsaysinmice -- points out bogus science claims by adding "in mice" where appropriate. Genius.
  4. What Machine Learning Needs from Hardware (Pete Warden) -- More arithmetic; Inference; Low Precision; Compatibility; Codesign.

Four short links: 15 April 2019

Making a Group, Robot Arms, Human Contact, and a Personal Archive

  1. You Should Organize a Study Group/Book Club/Online Group/Event! Tips on How to Do It (Stephanie Hurlburt) -- good advice on how to get people together.
  2. Berkeley Open Arms -- Berkeley Open Arms manufactures the BLUE robot arm that was developed at UC Berkeley's Robot Learning Lab. Paper (arXiv link).
  3. Human Contact is a Luxury Good (NYT) -- Life for anyone but the very rich—the physical experience of learning, living, and dying—is increasingly mediated by screens. Not only are screens themselves cheap to make, but they also make things cheaper. [...] The rich do not live like this. The rich have grown afraid of screens. They want their children to play with blocks, and tech-free private schools are booming. Humans are more expensive, and rich people are willing and able to pay for them. Conspicuous human interaction—living without a phone for a day, quitting social networks and not answering email—has become a status symbol.
  4. ArchiveBox -- The open source self-hosted web archive. Takes browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more.

Four short links: 12 April 2019

Automating Statistical Analysis, Chinese AI, Data Sovereignty, and Open vs. Government Licensing

  1. Tea: A High-level Language and Runtime System for Automating Statistical Analysis -- In Tea, users express their study design, any parametric assumptions, and their hypotheses. Tea compiles these high-level specifications into a constraint satisfaction problem that determines the set of valid statistical tests, and then executes them to test the hypothesis. Open source.
  2. Chinese AI -- the things that you probably don't realize about Chinese AI, such as the language gap disadvantaging Western researchers. (via BoingBoing)
  3. It's Time to Think about Jurisdictional Data Sovereignty (Kris Constable) -- not something that Americans think about, but which the rest of the world is chewing on.
  4. The Curious Case of Public Sans (Matthew Butterick) -- Public Sans is a derivative work of Franklin Sans, which requires derivatives to be released under Open Font License (OFL). But work of a government employee or agency is in the public domain. Oof.

Four short links: 11 April 2019

6 Pagers, Ethically Aligned Design, Infrastructure Malware, and IPv6 Scanning

  1. Using 6 Page and 2 Page Documents To Make Organizational Decisions (Ian Nowland) -- 6 pages and 60m meeting, or 2 pages and 30m meeting, with agenda designed to get to "disagree and commit." (via Simon Willison)
  2. Ethically Aligned Design (IEEE) -- a vision for priotizing human well-being with autonomous and intelligent systems.
  3. Safety Tampering Malware Infects Second Infrastructure Site -- The discovery has unearthed a new set of never-before-seen custom tools that shows the attackers have been operational since as early as 2014. The existence of these tools, and the attackers' demonstrated interest in operational security, lead FireEye researchers to believe there may be other sites beyond the two already known where the Triton attackers were or still are present.
  4. Scanning IPv6 Address Space -- the Mikrotik story is grim.

Four short links: 10 April 2019

iPhone Dominance, Security Keys, Embedded Systems Course, and Better Slack Client

  1. 83% U.S. College Students Have an iPhone -- from Piper Jaffray research. Android: 9%. Presumably 8% of U.S. college teens love their Nokia 3310s.
  2. Phishing and Security Keys -- Security Keys flip this on its head, trading something humans are bad at (noticing subtle differences) for something computers are good at (identifying exact matches). With Security Keys, instead of the user verifying the site, the site has to prove itself to the key. 💻🔐💪Is this useful for convincing people who need to be convinced that security keys are the way and the light? I don't know, but every bit of ammunition has to help.
  3. Embedded Systems Software Engineering -- a CMU course with notes, exercises, etc.
  4. Ripcord -- Slack (and Discord) client that is not built on browser tech, so it's zippy compared to Slack's own client.

Four short links: 9 April 2019

From Chrome to Edge, Old Web, Public Sans, and The Feedback Fallacy

  1. What Microsoft Removed from Chrome to make Edge (The Verge) -- Microsoft has removed or replaced more than 50 of Google’s services that come as part of Chromium, including things like ad blocking, Google Now, Google Cloud Messaging, and Chrome OS-related services.
  2. It Seems that Google is Forgetting the Old Web -- it seems more correct to say that Google forgets stuff that is more than 10 years old. If this is the case, Google will remember and index a smaller part of the web every year. Google may do so simply because it would be impossible to do more, for economical and/or technological constraints, which sooner or later would also hit its competitors. But this only makes bigger the problem of what to remember, what to forget, and, above all, how and who should remember and forget.
  3. Public Sans -- Open source. A strong, neutral typeface for text or display. From USWDS.
  4. The Feedback Fallacy (HBR) -- identifies three theories underpinning coworker feedback, and shows how they're all wrong. What these three theories have in common is self-centeredness: they take our own expertise and what we are sure is our colleagues’ inexpertise as givens; they assume that my way is necessarily your way. But as it turns out, in extrapolating from what creates our own performance to what might create performance in others, we overreach. Research reveals that none of these theories is true. Gives advice on how to give feedback more effectively, too. At best, this fetish with feedback is good only for correcting mistakes—in the rare cases where the right steps are known and can be evaluated objectively. And at worst, it’s toxic.

Four short links: 8 April 2019

Chinese Livestreaming, Tech and Teens, YouTube Professionalizing, and Inclusive Meetings

  1. Inside the Dystopian Reality of China's Livestreaming Craze -- Livestreaming exacts a huge mental toll on the people who do it. It’s easy money, but also toxic. Overeggs the dystopia (all interaction is a performance, professional interaction no less so), but is still a quick precis of where livestreaming is at in China. As for the toxic money, just ask Justin Kan.
  2. Screens, Teens, and Psychological Well-Being: Evidence From Three Time-Use-Diary Studies -- We found little evidence for substantial negative associations between digital-screen engagement—measured throughout the day or particularly before bedtime—and adolescent well-being.
  3. The Golden Age of YouTube is Over (The Verge) -- By promoting videos that meet certain criteria, YouTube tips the scales in favor of organizations or creators—big ones, mostly—that can meet those standards. My favorite part is where YouTube refers to the people who made it popular as our endemic creators, a phrase that'd make Orwell stabbier than usual.
  4. Inclusive Scientific Meetings -- This document presents some concrete recommendations for how to incorporate inclusion and equity practices into scientific meetings, from the ground up. This document includes three sections: planning the meeting; during the meeting; and assessing the meeting. A great cheatsheet that applies to non-science meetings, too.

Four short links: 5 April 2019

DIY Bio, Perl, Knowledge Graph Learning, and Amazon Memos

  1. Engineering Proteins in the Cloud -- Amazingly, we're pretty close to being able to create any protein we want from the comfort of our Jupyter Notebooks, thanks to developments in genomics, synthetic biology, and most recently, cloud labs. In this article, I'll develop Python code that will take me from an idea for a protein all the way to expression of the protein in a bacterial cell, all without touching a pipette or talking to a human. The total cost will only be a few hundred dollars! Using Vijay Pande from A16Z's terminology, this is Bio 2.0.
  2. 93% of Paint Splatters are Valid Perl Programs (Colin McMillen) -- tongue-in-cheek, but clever. I, of course, am fluent in those paint splatters. Have written a best-selling book on executable paint splatters. I should feel called-out, I guess, but it's too funny for me to feel much pain.
  3. AmpliGraph -- Python library for representation learning on knowledge graphs. [...] Use AmpliGraph if you need to: (1) Discover new knowledge from an existing knowledge graph. (2) Complete large knowledge graphs with missing statements. (3) Generate stand-alone knowledge graph embeddings. (4) Develop and evaluate a new relational model.
  4. Writing Docs at Amazon -- how to write those famous six-page narrative memos as preparation for meeting with Jeff Bezos, from someone who was there. As much about the meetings as the memos, as it should be.

Four short links: 4 April 2019

Language Creators, Undersea Cable, Open Source Trends, Making Math Questions

  1. A Conversation with Language Creators: Guido, James, Anders, and Larry (YouTube) -- A lot of people make the mistake of thinking that languages move at the same speed as hardware or all of the other technologies we live with. But languages are much more like math and much more like the human brain, and they all have evolved slowly. And we're still programming in languages that were invented 50 years ago. All the principles of functional programming were thought of more than 50 years ago.
  2. Undersea Internet Cables and Big Internet Companies (APNIC) -- interesting numbers. Between 2016 and 2020, about 100 new cables have been laid or planned. [...] The unit cost is cheaper for new cables than old cables whose lit capacity is increased. [...] In the last five years, the cables that are partly owned by Google, Facebook, Microsoft, and Amazon have risen eight-fold, and there are more such cables in the pipeline. These content providers also consume over 50% of all international bandwidth, and TeleGeography projects that by 2027, they could consume over 80%.
  3. Making Sense of a Crazy Year in Open Source -- if you haven't kept your eye on the latest weirdness in open source licensing (as companies attempt to squeeze commercial leverage from licenses), this is a great intro. Elastic CEO Shay Banon summed it up, saying: “We now have three tiers: open source and free, free but under a proprietary license, and paid under a proprietary license.”
  4. Mathematics Data Set (GitHub) -- This data set code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty. This is designed to test the mathematical learning and algebraic reasoning skills of learning models. Not what Dan Meyer would call good problems, mind you!

Four short links: 3 April 2019

HTML DRM, Toxic Incentives, Moral Crumple Zones, and Stats + Symbols

  1. The Effects of HTML's DRM -- middlemen DRM vendors can say "no" to your software playing video.
  2. YouTube Executives Ignored Warnings, Letting Toxic Videos Run Rampant (Bloomberg) -- The company spent years chasing one business goal above others: “Engagement,” a measure of the views, time spent and interactions with online videos. Conversations with over 20 people who work at, or recently left, YouTube reveal a corporate leadership unable or unwilling to act on these internal alarms for fear of throttling engagement. How you incentivize your product managers matters.
  3. Moral Crumple Zones: Cautionary Tales in Human-Robot Interaction -- Just as the crumple zone in a car is designed to absorb the force of impact in a crash, the human in a highly complex and automated system may become simply a component—accidentally or intentionally—that bears the brunt of the moral and legal responsibilities when the overall system malfunctions.
  4. Combining Symbols and Statistics So Machines Can Reason About What They See (MIT) -- overview of a paper that combines reasoning (symbols) with perception (statistics). Combining the two is one piece of progressing AI.

Four short links: 2 April 2019

Content Moderation, Speech in 1.6kbps, Science is Hard, and Forensic Typography

  1. Your Speech, Their Rules: Meet the People Who Guard the Internet (Medium) -- Adam: “Six months ago we told you, ‘Don’t pave the city with banana peels.’ You decided, ‘Let’s see what happens if we pave the city with banana peels.’ We are now here to clean up the injuries.”
  2. A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet -- this is witchcraft. Skip straight to the demos and have your mind blown. 8kb/s used to be the norm for crappy audio, but this is better quality in 19% of the bandwidth.
  3. Statistically Controlling for Confounding Constructs Is Harder than You Think -- Counterintuitively, we find that error rates are highest—in some cases approaching 100%—when sample sizes are large and reliability is moderate. Our findings suggest that a potentially large proportion of incremental validity claims made in the literature are spurious.
  4. Forensic DEC CRT Typography -- recreating the real look of a VT100.

Four short links: 1 April 2019

Communist RuneScape, API Versioning, Computer Graphics, User Stories

  1. The Communist Revolution inside RuneScape (Emilie Rākete) -- In 2007, a communist RuneScape clan was formed to bring proletarian rule to Server 32 of the world of Gielinor. In a context of scattered clan infighting, the RuneScape communist party was a rampantly victorious social force. Under the wise leadership of SireZaros, the communists waged a revolutionary struggle against reactionary and bourgeois clans that saw more than 5,000 player characters killed in the fighting.
  2. Back-end/Front-end Versioning (Christian Findlay) -- A submission can be rejected [from Google/Apple App Store] for any number of reasons, and it can take up to several days for any one submission to reach the store. On top of this, any user can choose to delay an upgrade, and many users will be on older phones that are not compatible with your current front-end API version. This leaves leaves a situation where front-end versions may be out of sync with each other, or out of sync with the latest back-end version. Here is a quick look at two patterns that might emerge as a strategy to solve the problem.
  3. Introduction to Computer Graphics -- a free, online textbook covering the fundamentals of computer graphics and computer graphics programming.
  4. Engineering Guide to Writing User Stories -- the headings are: Using consistent language; Users do not want your stuff; Removing technical details; Clarifying roles; Making user stories verifiable; Spotting the incompleteness; Ranking user stories.

Four short links: 29 March 2019

Programming Languages, Asset Graphing, Statistical Tests, and Embeddable WebAssembly

  1. Programmer Migration Patterns -- I made a little flow chart of mainstream programming languages and how programmers seem to move from one to another.
  2. cartography -- a Python tool that consolidates infrastructure assets and the relationships between them in an intuitive graph view powered by a Neo4j database. Video.
  3. Common statistical tests are linear models (or: how to teach stats) -- the linear models underlying common parametric and non-parametric tests. Formulating all the tests in the same language highlights the many similarities between them.
  4. lucet -- a native WebAssembly compiler and runtime. It is designed to safely execute untrusted WebAssembly programs inside your application.. Open source, from Fastly. Announcement.

Four short links: 28 March 2019

Data-Oriented Design, Time Zone Hell, Music Algorithms, and Fairness in ML

  1. Data Oriented Design -- A curated list of data-oriented design resources.
  2. Storing UTC is Not a Silver Bullet -- time zones will drive you to drink.
  3. Warner Music Signed an Algorithm to a Record Deal (Verge) -- Although Endel signed a deal with Warner, the deal is crucially not for “an algorithm,” and Warner is not in control of Endel’s product. The label approached Endel with a distribution deal and Endel used its algorithm to create 600 short tracks on 20 albums that were then put on streaming services, returning a 50/50 royalty split to Endel. Unlike a typical major label record deal, Endel didn’t get any advance money paid upfront, and it retained ownership of the master recordings.
  4. 50 Years of Unfairness: Lessons for Machine Learning -- We trace how the notion of fairness has been defined within the testing communities of education and hiring over the past half century, exploring the cultural and social context in which different fairness definitions have emerged. In some cases, earlier definitions of fairness are similar or identical to definitions of fairness in current machine learning research, and foreshadow current formal work. In other cases, insights into what fairness means and how to measure it have largely gone overlooked. We compare past and current notions of fairness along several dimensions, including the fairness criteria, the focus of the criteria (e.g., a test, a model, or its use), the relationship of fairness to individuals, groups, and subgroups, and the mathematical method for measuring fairness (e.g., classification, regression). This work points the way toward future research and measurement of (un)fairness that builds from our modern understanding of fairness while incorporating insights from the past.

Four short links: 27 March 2019

Linkers and Loaders, Low-Low-Low Power Bluetooth, Voice, and NVC

  1. Linkers and Loaders -- the uncorrected manuscript chapters for my Linkers and Loaders, published by Morgan-Kaufman.
  2. <1mW Bluetooth LTE Transmitter -- Consuming just 0.6 milliwatts during transmission, it would broadcast for 11 years using a typical 5.8-mm coin battery. Such a millimeter-scale BLE radio would allow these ant-sized sensors to communicate with ordinary equipment, even a smartphone. Ingenious engineering hacks to make this work.
  3. Mumble -- an open source, low-latency, high-quality voice chat software primarily intended for use while gaming.
  4. A Guide to Difficult Conversations (Dave Bailey) -- your quarterly reminder that non-violent communication exists and is a good thing.

Four short links: 26 March 2019

Software Stack, Gig Economy, Simple Over Flexible, and Packet Radio

  1. Thoughts on Conway's Law and the Software Stack (Jessie Frazelle) -- All these problems are not small by any means. They are miscommunications at various layers of the stack. They are people thinking an interface or feature is secure when it is merely a window dressing that can be bypassed with just a bit more knowledge about the stack. I really like the advice Lea Kissner gave: “take the long view, not just the broad view.” We should do this more often when building systems.
  2. Troubles with the Open Source Gig Economy and Sustainability Tip Jar (Chris Aniszczyk) -- thoughtful long essay with a lot of links for background reading, on the challenges of sustainability via Patreon, etc., through to some signs of possibly-working models.
  3. Choose Simple Solutions Over Flexible Ones -- flexibility does not come for free.
  4. New Packet Radio (Hackaday) -- a custom radio protocol, designed to transport bidirectional IP traffic over 430MHz radio links (ham radio). This protocol is optimized for "point to multipoint" topology, with the help of managed-TDMA. Note that Hacker News commentors indicate some possible FCC violations; though, as the project comes from France, that's probably not a problem for the creators of the software.

Four short links: 25 March 2019

Hiring for Neurodiversity, Reprogrammable Molecular Computing, Retro UUCP, and Industrial Go

  1. Dell's Neurodiversity Program -- excellent work from Dell making themselves an attractive destination for folks on the autistic spectrum.
  2. Reprogrammable Molecular Computing System (Caltech) -- The researchers were able to experimentally demonstrate 6-bit molecular algorithms for a diverse set of tasks. In mathematics, their circuits tested inputs to assess if they were multiples of three, performed equality checks, and counted to 63. Other circuits drew "pictures" on the DNA "scarves," such as a zigzag, a double helix, and irregularly spaced diamonds. Probabilistic behaviors were also demonstrated, including random walks as well as a clever algorithm (originally developed by computer pioneer John von Neumann) for obtaining a fair 50/50 random choice from a biased coin. Paper.
  3. Dataforge UUCP -- it's like Cory Doctorow guestwrote our timeline: UUCP over SSH to give decentralized comms for freedom fighters.
  4. Go for Industrial Programming (Peter Bourgon) -- I’m speaking today about programming in an industrial context. By that I mean: in a startup or corporate environment; within a team where engineers come and go; on code that outlives any single engineer; and serving highly mutable business requirements. [...] I’ve tried to select for areas that have routinely tripped up new and intermediate Gophers in organizations I’ve been a part of, and particularly those things that may have nonobvious or subtle implications. (via ceej)

Four short links: 22 March 2019

Explainable AI, Product Management, REPL for Games, and Open Source Inventory

  1. XAI -- An explainability toolbox for machine learning. Follows the Ethical Institute for AI & Machine Learning's 8 principles.
  2. The Producer Playbook -- Guidelines and best practices for producers and project managers.
  3. Repl.it Adds Graphics -- PyGame in the browser, in fast turnaround time.
  4. ScanCode Toolkit -- detects licenses, copyrights, package manifests and dependencies, and more by scanning code ... to discover and inventory open source and third-party packages used in your code.