Four Short Links

Nat Torkington's eclectic collection of curated links.

Four short links: 25 April 2018

Music Biz, Amazon DNS Hijack, Embedded Platform, and Tech Change

  1. Music Industry's "Fantastic 2017" -- That $1.4 billion of growth puts the global total just below 2008 levels ($17.7 billion), meaning that the decline wrought through much of the last 10 years has been expunged. The recorded music business is locked firmly in growth mode, following nearly $1 billion growth in 2016. Cory Doctorow makes the point that while the "music industry" is booming, artist incomes aren't growing at the same rate. Or, indeed, at all.
  2. Amazon's DNS Hijacked For Two Hours -- in service of raiding a cryptocurrency website.
  3. Nerves -- Pack your whole application into as little as 12MB and have it start in seconds by booting a lean cross-compiled Linux directly to the battle-hardened Erlang VM. Let Nerves take care of the network, discovery, I/O, firmware updates, and more. Focus on what matters, and have fun writing robust and maintainable software. Nifty approach to a very real problem.
  4. Five Things We Need to Know About Technological Change (Neil Postman) -- this is incredibly prescient and good. Technological change is not additive; it is ecological.[...] A new medium does not add something; it changes everything. In the year 1500, after the printing press was invented, you did not have old Europe plus the printing press. You had a different Europe. After television, America was not America plus television. Television gave a new coloration to every political campaign, to every home, to every school, to every church, to every industry, and so on. That is why we must be cautious about technological innovation. The consequences of technological change are always vast, often unpredictable, and largely irreversible. See also a related talk by Postman. (via Daniel G. Siegel)
  5. Note: The email edition of Four Short Links will be discontinued on Monday, April 30. New editions of Four Short Links will still be published every weekday at oreilly.com/4sl and through the Four Short Links feed. Please send questions about this change to onlinecap@oreilly.com.

Four short links: 24 April 2018

IoT, Migrations, Prisoner's Dilemma, and Security

  1. IoT Inspector -- The Princeton University research team is digging into the traffic that IoT devices do, to identify malicious or otherwise dodgy behaviour. They want to know what IoT devices you have so they can test them. They'll release their packet capture and analysis tool as open source. (via BoingBoing)
  2. Migrations (Will Larson) -- very good explanation of how to manage migrations which are usually the only available avenue to make meaningful progress on technical debt. (via Simon Willison)
  3. Beating the Prisoner's Dilemma -- In 2013 as the semester ended in December, students in Fröhlich’s "Intermediate Programming," "Computer System Fundamentals," and "Introduction to Programming for Scientists and Engineers" classes decided to test the limits of the policy, and collectively planned to boycott the final. Because they all did, a zero was the highest score in each of the three classes, which, by the rules of Fröhlich’s curve, meant every student received an A. How did they manage to avoid defection? (If just one student sat the test, that person would get an A and everyone else fail) The students waited outside the rooms to make sure that others honored the boycott, and were poised to go in if someone [broke the pact]. No one did, though. Prisoner's Dilemma only works if the prisoners can't communicate. (via Freakonomics and Ian Miers)
  4. Computer Security: The Achilles' Heel of the Air Force? -- incredibly prescient 1979 article on the important problems of security. The stories of repeatedly improving early systems like GCOS and MULTICS are super-interesting and rich with parallels for today. A contract cannot provide security. Basically, the same GCOS system was selected for a major command and control system. Advocates assured the users that it would be made multilevel secure because security was required by the contract. An extensive tiger team evaluation found there were many deep and complex security flaws that defied practical repair—the computer was finally deemed not only insecure but insecurable.
  5. Note: The email edition of Four Short Links will be discontinued on Monday, April 30. New editions of Four Short Links will still be published every weekday at oreilly.com/4sl and through the Four Short Links feed. Please send questions about this change to onlinecap@oreilly.com.

Four short links: 23 April 2018

Metrics and Incentives, Facebook as Fire Starter, Meeting Mastery, and Weird Chart Types

  1. Heart Surgeons Avoid Difficult Operations to Avoid Poor Performance Rankings -- Just under one-third of the 115 specialists who responded said they had recommended a different treatment path to avoid adding another death to their score. And 84% said they were aware of other surgeons doing the same. Reminds me of MySociety's hard-learned lessons with their MP scorecard, whereby MPs would ask pointless questions in Parliament just to get their numbers up.
  2. When Countries are Tinderboxes, and Facebook is a Match (NYT) -- where institutions are weak or undeveloped, Facebook’s newsfeed can inadvertently amplify dangerous tendencies. Designed to maximize user time on site, it promotes whatever wins the most attention. Posts that tap into negative, primal emotions like anger or fear, studies have found, produce the highest engagement, and so proliferate. Plenty of horrifying examples of lynchings and riots triggered by Facebook posts.
  3. Reflections (Matt Webb) -- Much of any founder's time will be spent meeting advisors and investors. There's a knack to running the room and getting what you want out of it, while maintaining a feeling of collaboration and conversation. Meetings aren't just time you spend in a room together. Meetings are an atomic unit of work. They should have purpose and outcomes, although these don't necessarily need to be stated. There are a lot of small ways to make sure attendees don't drift or feel lost. Really fascinating notes about how he coaches his founders through the incubator program.
  4. Xeno.graphics -- weird but (sometimes) useful charts.

Four short links: 20 April 2018

Functional Programming, High-Dimensional Data, Games and Datavis, and Container Management

  1. Interview with Simon Peyton-Jones -- I had always assumed that the more bleeding-edge changes to the type system, things like type-level functions, generalized algebraic data types (GADTs), higher rank polymorphism, and existential data types, would be picked up and used enthusiastically by Ph.D. students in search of a topic, but not really used much in industry. But in fact, it turns out that people in companies are using some of these still-not-terribly-stable extensions. I think it's because people in companies are writing software that they want to still be able to maintain and modify in five years time. SPJ is the creator of Haskell, and one of the leading thinkers in functional programming.
  2. HyperTools -- A Python toolbox for visualizing and manipulating high-dimensional data. Open source. High-dimensional = "a lot of columns in each row".
  3. What Videogames Have to Teach Us About Data Visualization -- super-interesting exploration of space, storytelling, structure, and annotations.
  4. Titus -- Netflix open-sourced their container management platform. There aren't many companies with the scale problems of Amazon, Netflix, Google, etc., so it's always interesting to see what comes out of them.

Four short links: 19 April 2018

Free Multics, Community Relevance, Speech Synthesis, and Dandelion Data

  1. BAN.AI Multics -- free multiuser Multics (predecessor to Unix) emulation. This Multics guide will be useful.
  2. The Art of Relevance -- explores how mission-driven organizations can matter more to more people. The book is packed with inspiring examples, rags-to-relevance case studies, research-based frameworks, and practical advice on how your work can be more vital to your community. Should be read by startups (relevant to your customers?) and anyone who is trying to build a community around their software. Text available for free online, print versions still available for purchase.
  3. VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop -- We present a new neural text to speech (TTS) method that is able to transform text to speech in voices that are sampled in the wild. Unlike other systems, our solution is able to deal with unconstrained voice samples and without requiring aligned phonemes or linguistic features. The Presidential voices are impressive. Code and paper available.
  4. No Boundaries for Facebook Data -- Today we report yet another type of surreptitious data collection by third-party scripts that we discovered: the exfiltration of personal identifiers from websites through “login with Facebook” and other such social login APIs. Specifically, we found two types of vulnerabilities: seven third parties abuse websites’ access to Facebook user data; one third party uses its own Facebook “application” to track users around the web.

Four short links: 18 April 2018

Open Source Slack-alike, Open Source MailChimp-alike, DeepFake PSA, and Secure Devices

  1. Zulip -- FOSS Slack-type chat.
  2. Mailtrain -- self-hosted GPLv3 MailChimp-style newsletter service that you can hook up to your favorite mail service (e.g., Mailgun).
  3. Fake News PSA -- DeepFake video of Barack Obama saying things that Obama never said (made for Buzzfeed).
  4. Seven Properties of Highly Secure Devices (Microsoft) -- Hardware-based root of trust; small trusted computing base; defense in depth; compartmentalization; certificate-based authentication; renewable security; failure reporting.

Four short links: 17 April 2018

Dubsteganography, Parsing History, Hackin' the Jack In, and Model Bias

  1. Hide Data in Dubstep Drops -- the blog post shows how to use it. Skrillex meets steganography!
  2. Parsing Timeline -- wonderfully detailed, yet it reads almost chatty. Interesting and informative.
  3. Securing Wireless Neurostimulators -- a hack and discussion of the risk of insecure implantable medical devices that interface with the brain. (via Paper a Day)
  4. Text Embedding Models Contain Bias (Google) -- great to see this making its way to research outputs, instead of being the province of damage control and bad PR. The Developers section of the Semantic Experiences microsite talks about "unwanted associations": In Semantris, the list of words we're showing are hand curated and reviewed. To the extent possible, we've excluded topics and entities that we think particularly invite unwanted associations, or can easily complement them as inputs. In Talk to Books, while we can't manually vet each sentence of 100,000 volumes, we use a popularity measure which increases the proportion of volumes that are published by professional publishing houses. There are additional measures that could be taken. For example, a toxicity classifier or sensitive topics classifier could determine when the input or the output is something that may be objectionable or party to an unwanted association. We recommend taking bias-impact mitigation steps when crafting end-user applications built with these models.

Four short links: 16 April 2018

Light-Powered Camera, Government Blogging, TensorFlow.js, and Metanotation

  1. Light-Powered Camera -- prototype gets 15 frames/second, no external power. The light is used for both image sensing and solar power.
  2. Government Blogs and Government Bloggers (Public Strategist) -- the blogging spectrum 2x2 is solid and explains why government blogs are often about prototypes, not operations.
  3. Introducing TensorFlow.js -- an open source library you can use to define, train, and run machine learning models entirely in the browser, using Javascript and a high-level layers API.
  4. It's Time for a New Old Programming Language (YouTube) -- Guy L. Steele Jr.'s talk about the Computer Science Metanotation that CS papers use to indicate programs without having to use a specific programming language. This is one for your inner CS meta-nerd.

Four short links: 13 April 2018

Compositing, Exfiltrating, Listening, and Munging

  1. Deep Painterly Harmonisation -- composite and preserve the style of the destination image. The examples are impressive.
  2. PowerHammer: Exfiltrate Data Over Power Lines -- In this case, a malicious code running on a compromised computer can control the power consumption of the system by intentionally regulating the CPU utilization. Data is modulated, encoded, and transmitted on top of the current flow fluctuations, and then it is conducted and propagated through the power lines.
  3. Learn To Listen At The Cocktail Party -- We present a joint audio-visual model for isolating a single speech signal from a mixture of sounds such as other speakers and background noise. Solving this task using only audio as input is extremely challenging and does not provide an association of the separated speech signals with speakers in the video. In this paper, we present a deep network-based model that incorporates both visual and auditory signals to solve this task. The visual features are used to "focus" the audio on desired speakers in a scene and to improve the speech separation quality.
  4. prototool -- a Swiss Army Knife for protocol buffers.

Four short links: 12 April 2018

Probabilistic Programming, Bad Copyright, Technical Debt, and Video Data Set

  1. TensorFlow Probability -- a probabilistic programming toolbox for machine learning.
  2. European Copyright Law Isn't Great. It Could Soon Get a Lot Worse. (EFF) -- The practical effect of this could be to make it impossible for a news publisher to publish their stories for free use, for example by using a Creative Commons license. (via BoingBoing)
  3. A Taxonomy of Technical Debt -- you can argue about whether his categories are your categories, but it's useful to have words for the nuance.
  4. Moments in Time Data Set -- A large-scale data set for recognizing and understanding action in videos. (via MIT News)

Four short links: 11 April 2018

Assignment, Warranties, Data, and Public Goods

  1. Why Does "=" Mean Assignment -- marvellous history lesson.
  2. Warranty Void if Removed Stickers Are Bull -- Federal law says you can repair your own things, and manufacturers cannot force you to use their own repair services. (via BoingBoing)
  3. TXR -- a pattern language and a Lisp variant for data problems.
  4. Roman Roads and Persistence Development -- In some ways, the emergence of the Roman road network is almost a natural experiment—in light of the military purpose of the roads, the preferred straightness of their construction, and their construction in newly conquered and often undeveloped regions. This type of public good seems to have had a persistent influence on subsequent public good allocations and comparative development. At the same time, the abandonment of the wheel shock in MENA appears to have been powerful enough to cause that degree of persistence to break down. Overall, our analysis suggests that a public good provision is a powerful channel through which persistence in comparative development comes about. I wonder whether this kind of analysis is even conceivable with internet public policy like broadband, coding classes, and laws. (via BoingBoing)

Four short links: 10 April 2018

Deep Learning Learnings, Reverse Engineering WhatsApp, Database Client, and Social Science

  1. Lessons Learned Reproducing a Deep Reinforcement Learning Paper -- REALLY good retrospective on eight months reproducing a paper, with lots of lessons learned, like starting a reinforcement learning project, you should expect to get stuck like you get stuck on a math problem. It’s not like my experience of programming in general so far where you get stuck but there’s usually a clear trail to follow and you can get unstuck within a couple of days at most. It’s more like when you’re trying to solve a puzzle, there are no clear inroads into the problem, and the only way to proceed is to try things until you find the key piece of evidence or get the key spark that lets you figure it out.
  2. Reverse Engineering WhatsApp -- This project intends to provide a complete description and re-implementation of the WhatsApp Web API, which will eventually lead to a custom client. WhatsApp Web internally works using WebSockets; this project does as well.
  3. DatabaseFlow -- an open source self-hosted SQL client, GraphQL server, and charting application that works with your database. Visualize schemas, query plans, charts, and results. You can run Database Flow locally for your own use, or install to a shared server for the whole team.
  4. Code and Data for the Social Sciences -- This handbook is about translating insights from experts in code and data into practical terms for empirical social scientists.

Four short links: 9 April 2018

Monads, GDPR, Blockchain, and Search

  1. What We Talk About When We Talk About Monads -- This paper is not a monad tutorial. It will not tell you what a monad is. Instead, it helps you understand how computer scientists and programmers talk about monads and why they do so.
  2. Publishers and GDPR -- a nice explanation of what GDPR is bringing to companies like Facebook and Google, how it's changing ad-serving, and what it means for content publishers.
  3. Blockchain is Not Only Crappy Technology But a Bad Vision for the Future -- There is no single person in existence who had a problem they wanted to solve, discovered that an available blockchain solution was the best way to solve it, and therefore became a blockchain enthusiast.
  4. Typesense -- open source typo tolerant search engine that delivers fast and relevant results out of the box.

Four short links: 6 April 2018

Library Management, Flame Graphs, Silent Speech Interface, and Cloud Backup

  1. Thou Shalt Not Depend on Me (ACM) -- with 37% of websites using at least one known vulnerable library, and libraries often being included in quite unexpected ways, there clearly is room for improvement in library handling on the web.
  2. FlameScope -- Netflix's open source visualization tool for exploring different time ranges as Flame Graphs. (via Netflix Tech Blog)
  3. AlterEgo: A Personalized Wearable Silent Speech Interface -- The results from our preliminary experiments show that the accuracy of our silent speech system is at par with the reported word accuracies of state-of-the-art speech recognition systems, in terms of being robust enough to be deployed as voice interfaces, albeit on smaller vocabulary sets. (via MIT News)
  4. Duplicity -- Encrypted bandwidth-efficient backup using the rsync algorithm. Common use case is backing up server to S3, but there's an impressive number of connective services, including Google Drive, Azure, Mega.co, and Dropbox.

Four short links: 5 April 2018

Interactive Notebooks, Molecule-making AI, Interpersonal Dynamics, and Javascript Motion Library

  1. MyBinder -- Turn a GitHub repo into a collection of interactive notebooks. (via Julia Evans)
  2. Molecule-Making AI (Nature) -- The new AI tool, developed by Marwin Segler, an organic chemist and artificial intelligence researcher at the University of Münster in Germany, and his colleagues, uses deep learning neural networks to imbibe essentially all known single-step organic-chemistry reactions—about 12.4 million of them. This enables it to predict the chemical reactions that can be used in any single step. The tool repeatedly applies these neural networks in planning a multi-step synthesis, deconstructing the desired molecule until it ends up with the available starting reagents. (via Slashdot)
  3. Interpersonal Dynamics -- The list of common corrosive dynamics rang true: bone-deep competition; fear of being found out; my reality is not the reality; it's no fun being the squeaky wheel; feedback stays at the surface; denial that work is personal.
  4. Popmotion -- A functional, flexible JavaScript motion library.

Four short links: 4 April 2018

Forum Software, Data Analytics, Datalog Query, and Online != High-Tech

  1. Spectrum -- open source forum software. (via announcement)
  2. MacroBase -- a data analytics tool that prioritizes attention in large data sets using machine learning [...] specialized for one task: finding and explaining unusual or interesting trends in data.
  3. datahike -- a durable database with an efficient datalog query engine.
  4. Why So Many Online Mattress Brands -- trigger for a rant: software is eating everything, but that doesn't make everything an innovative company. If you're applying the online sales playbook to product X (kombucha, mattresses, yoga mats) it doesn't make you a Level 9 game-changing disruptive TechCo, it makes you a retail business keeping up with the times. I'm curious where the next interesting bits of tech are—@gnat me with your ideas.

Four short links: 3 April 2018

Internet of Battle Things, Program Fuzzing, Data Sheets for Data Sets, and Retro Port

  1. Challenges and Characteristics of Intelligent Autonomy for Internet of Battle Things in Highly Adversarial Environments -- Numerous artificially intelligent, networked things will populate the battlefield of the future, operating in close collaboration with human warfighters, and fighting as teams in highly adversarial environments. This paper explores the characteristics, capabilities, and intelligence required of such a network of intelligent things and humans—Internet of Battle Things (IOBT). It will experience unique challenges that are not yet well addressed by the current generation of AI and machine learning. (via Slashdot)
  2. T-Fuzz: Fuzzing by Program Transformation -- clever! To improve coverage, existing approaches rely on imprecise heuristics or complex input mutation techniques (e.g., symbolic execution or taint analysis) to bypass sanity checks. Our novel method tackles coverage from a different angle: by removing sanity checks in the target program. T-Fuzz leverages a coverage-guided fuzzer to generate inputs. Whenever the fuzzer can no longer trigger new code paths, a lightweight, dynamic tracing-based technique detects the input checks that the fuzzer-generated inputs fail. These checks are then removed from the target program. Fuzzing then continues on the transformed program, allowing the code protected by the removed checks to be triggered and potential bugs discovered.
  3. Data Sheets for Data Sets -- Currently there is no standard way to identify how a data set was created, and what characteristics, motivations, and potential skews it represents. To begin to address this issue, we propose the concept of a data sheet for data sets, a short document to accompany public data sets, commercial APIs, and pretrained models.
  4. Porting Prince of Persia to the BBC Master -- the author of the original 1980s game, Jordan Mechner, found and posted the source code to the Apple II version. These fine folks ported it to a different 1980s computer. I love the creativity of people who hack on small retro systems. I find big web stuff lacks that these days: it's all up-to-your-elbows in frameworks.

Four short links: 2 April 2018

Game Networking, Grep JSON, Voting Ideas, and UIs from Pictures

  1. Valve's Networking Code -- a basic transport layer for games. The features are: connection-oriented protocol (like TCP)...but message-oriented instead of stream-oriented; mix of reliable and unreliable messages; messages can be larger than underlying MTU, the protocol performs fragmentation and reassembly, and retransmission for reliable; bandwidth estimation based on TCP-friendly rate control (RFC 5348); encryption; AES per packet, Ed25519 crypto for key exchange and cert signatures; the details for shared key derivation and per-packet IV are based on Google QUIC; tools for simulating loss and detailed stats measurement.
  2. gron -- grep JSON from the command line.
  3. The Problem With Voting -- I don't agree with all of the analysis, but the proposed techniques are interesting. I did like the term "lazy consensus" where consensus is assumed to be the default state (i.e., “default to yes”). The underlying theory is that most proposals are not interesting enough to discuss. But if anyone does object, a consensus seeking process begins. (via Daniel Bachhuber)
  4. pix2code -- open source code that generates Android, iOS, and web source code for a UI from just a photo. It's not coming for your job any time soon (over 77% of accuracy), but it's still a nifty idea. (via Two Minute Papers)

Four short links: 30 March 2018

Data Literacy, Data Science Readings, Bloated Data Architectures, and AI Ruins Everything

  1. Data Defenders -- game for grade 4-6 that teaches children and pre-teens the concept of personal information and its economic value, and introduces them to ways to manage and protect their personal information on the websites and apps they enjoy. (via BoingBoing)
  2. Readings in Applied Data Science -- pointers to interesting papers, via Hadley Wickham's Stanford class.
  3. COST: Configuration that Outperforms a Single Thread -- The COST of a given platform for a given problem is the hardware configuration required before the platform outperforms a competent single-threaded implementation. [...] We survey measurements of data-parallel systems recently reported in SOSP and OSDI, and find that many systems have either a surprisingly large COST, often hundreds of cores, or simply underperform one thread for all their reported configurations.
  4. Finding Alternative Musical Scales -- is there nothing that AI cannot improve/ruin? We search for alternative musical scales that share the main advantages of classical scales: pitch frequencies that bear simple ratios to each other, and multiple keys based on an underlying chromatic scale with tempered tuning. We conduct the search by formulating a constraint satisfaction problem that is well suited for solution by constraint programming. We find that certain 11-note scales on a 19-note chromatic stand out as superior to all others. These scales enjoy harmonic and structural possibilities that go significantly beyond what is available in classical scales and therefore provide a possible medium for innovative musical composition. (via Mark J. Nelson)

Four short links: 29 March 2018

Facebook Container, Publishing Future, Social Media Ethics, and Online Virality

  1. Facebook Container -- Firefox add-on that isolates your Facebook identity from the rest of your web activity. When you install it, you will continue to be able to use Facebook normally. Facebook can continue to deliver their service to you and send you advertising. The difference is that it will be much harder for Facebook to use your activity collected off Facebook to send you ads and other targeted messages.
  2. What's Coming for Online Publishing (Doc Searls) -- What will happen when the Times, the New Yorker, and other pubs own up to the simple fact that they are just as guilty as Facebook of leaking its readers’ data to other parties, for—in many if not most cases—God knows what purposes besides “interest-based” advertising? (via Piers Harding)
  3. Affiliate Marketing Not Disclosed on Social Media (Freedom to Tinker) -- Of all the YouTube videos and Pinterest pins that contained affiliate links, only ~10% and ~7% respectively contained accompanying disclosures. (paper)
  4. The Structural Virality of Online Diffusion -- Indeed, the very label “viral hit” implies precisely the exponential spreading of the sort observed in contagion models in their supercritical regime. It is therefore notable that essentially everything we observe, including the very largest and rarest events, can be accounted for by a simple model operating entirely in the low infectiousness parameter regime.