Four Short Links

Nat Torkington's eclectic collection of curated links.

Four short links: 12 December 2017

Learned Indexes, Text Tables, Weaponized Ed Data, and Bad Feedback Loops

  1. The Case for Learned Index Structures -- Our initial results show that by using neural nets, we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. More importantly, though, we believe that the idea of replacing core components of a data management system through learned models has far-reaching implications for future systems designs and that this work just provides a glimpse of what might be possible. (via Simon Willison)
  2. tty-table -- displays ASCII tables in your terminal emulator or browser console. Word wrap, padding, alignment, colors, Asian character support, per-column callbacks, and you can pass rows as objects or arrays. Backward compatible with Automattic/cli-table.
  3. Weaponization of Ed Data (Audrey Watters) -- 2017 made it clear, I’d like to think, that the dangers education technology and its penchant for data collection aren’t simply a matter of a potential loss of privacy or a potential loss of data. The stakes now are much, much higher.
  4. Money as Instrument of Change (YouTube) -- asked about exploiting human behaviour in social media, the former VP of User Growth, Mobile & International at Facebook says, The short-term dopamine-driven feedback loops that we have created are destroying how society works. The whole talk is interesting. And sweary. (via Gizmodo)

Four short links: 11 December 2017

Programming Falsehoods, Money Laundering, Vulnerability Markets, and Algorithmic Transparency

  1. Falsehoods Programmers Believe About Programming -- I feel like "understanding programming" is like learning about science in school: it's a progressive series of "well, actually it's more complicated than that" until you're left questioning your own existence. (Descartes would tell us computo ergo sum.)
  2. Kleptocrat -- You are a corrupt politician, and you just got paid. Can you hide your dirty money from The Investigator and cover your tracks well enough to enjoy it? The game is made by a global investigative firm that specializes in tracing assets. A+ for using games to Share What You Know. (via BoingBoing)
  3. Economic Factors of Vulnerability Trade and Exploitation -- In this paper, we provide an empirical investigation of the economics of vulnerability exploitation, and the effects of market factors on likelihood of exploit. Our data is collected first-handedly from a prominent Russian cybercrime market where the trading of the most active attack tools reported by the security industry happens. Our findings reveal that exploits in the underground are priced similarly or above vulnerabilities in legitimate bug-hunting programs, and that the refresh cycle of exploits is slower than currently often assumed. On the other hand, cybercriminals are becoming faster at introducing selected vulnerabilities, and the market is in clear expansion both in terms of players, traded exploits, and exploit pricing. We then evaluate the effects of these market variables on likelihood of attack realization, and find strong evidence of the correlation between market activity and exploit deployment. (via Paper a Day)
  4. Principles for Algorithmic Transparency (ACM) -- Awareness; Access and redress; Accountability; Explanation; Data provenance; Auditability; and Validation and Testing. (via Pia Waugh)

Four short links: 8 December 2017

Books for Young Engineers, Fake News, Digital Archaeology, and Bret Victor

  1. Books for Budding Engineers (UCL) -- a great list of books for kids who have a STEM bent.
  2. Data-Driven Analysis of "Fake News" -- In sheer numerical terms, the information to which voters were exposed during the election campaign was overwhelmingly produced not by fake news sites or even by alt-right media sources, but by household names like "The New York Times," "The Washington Post," and CNN. Without discounting the role played by malicious Russian hackers and naïve tech executives, we believe that fixing the information ecosystem is at least as much about improving the real news as it about stopping the fake stuff. A lot of data to support this conclusion. (via Dean Eckles)
  3. Digital Archaeology -- papers from a conference, whose highlights were tweeted here. In case you thought for a second there was some corner of the world that software wasn't going to eat.
  4. Dynamicland -- rumours of Bret Victor's new AR project about computing with space. See also the Twitter account showing off goodies.

Four short links: 7 December 2017

Measurement, Value, Privacy, and Openness

  1. Emerging Gov Tech: Measurement -- presenters from inside and outside of government to share how they were using measurement to inform decision-making. The hologram reminding people to dump biosecurity material was nifty, and the Whare Hauora project is much needed in a country with a lot of dank draughty houses.
  2. When Is a Dollar Not a Dollar -- a dollar of cost savings is worth one dollar to the customer, but a dollar of extra revenue is usually worth dimes or pennies (depending on the customer's profit margin).
  3. Learning with Privacy at Scale (Apple) -- about their differential privacy work. Their attention to detail is lovely. Whenever an event is generated on-device, the data is immediately privatized via local differential privacy and temporarily stored on-device using data protection, rather than being immediately transmitted to the server. After a delay based on device conditions, the system randomly samples from the differentially private records subject to the above limit and sends the sampled records to the server.
  4. When Open Data is a Trojan Horse: The Weaponization of Transparency in Science and Governance -- We suggest that legislative efforts that invoke the language of data transparency can sometimes function as ‘‘Trojan Horses’’ through which other political goals are pursued. Framing these maneuvers in the language of transparency can be strategic, because approaches that emphasize open access to data carry tremendous appeal, particularly in current political and technological contexts.

Four short links: 6 December 2017

TouchID for SSH, Pen Testing Checklist, Generativity, and AI Data

  1. SeKey -- an SSH agent that allow users to authenticate to UNIX/Linux SSH servers using the Secure Enclave.
  2. Web Application Penetration Testing Checklist -- a useful checklist of things to poke at if you're doing a hygiene sweep.
  3. The Bullet Hole Misconception -- Computer technology has not yet come close to the printing press in its power to generate radical and substantive thoughts on a social, economical, political, or even philosophical level. I really like this metric of success.
  4. AI Index (Stanford) -- This report aggregates a diverse set of data, makes that data accessible, and includes discussion about what is provided and what is missing. Most importantly, the AI Index 2017 Report is a starting point for the conversation about rigorously measuring activity and progress in AI in the future.

Four short links: 5 December 2017

Analog Computing, Program Synthesis, Midwestern Investment, and Speed Email

  1. A New Analog Computer (IEEE) -- Digital programming made it possible to connect the input of a given analog block to the output of another one, creating a system governed by the equation that had to be solved. No clock was used: voltages and currents evolved continuously rather than in discrete time steps. This computer could solve complex differential equations of one independent variable with an accuracy that was within a few percent of the correct solution.
  2. Barliman -- Barliman is a prototype "smart editor" that performs real-time program synthesis to try to make the programmer's life a little easier. Barliman has several unusual features: given a set of tests for some function foo, Barliman tries to "guess" how to fill in a partially specified definition of foo to make all of the tests pass; given a set of tests for some function foo, Barliman tries to prove that a partially specified definition of foo is inconsistent with one or more of the tests; given a fully or mostly specified definition of some function foo, Barliman will attempt to prove that a partially specified test is consistent with, or inconsistent with, the definition of foo.
  3. Investing in the Midwest (NYT) -- Steve Case closes a fund backed by every tech billionaire you've heard of, for investing in midwestern businesses. Mr. Schmidt of Alphabet said he was sold on the idea from the moment he first heard about it. “I felt it was a no-brainer,” he said. “There is a large selection of relatively undervalued businesses in the heartland between the coasts, some of which can scale quickly.”
  4. Email Like a CEO -- see also How to Write Email with Military Precision. (via Hacker News)

Four short links: 4 December 2017

Campaign Cybersecurity, Generated Games, Copyright-Induced Style, and Tech Ethics

  1. Campaign Cybersecurity Playbook -- The information assembled here is for any campaign in any party. It was designed to give you simple, actionable information that will make your campaign’s information more secure from adversaries trying to attack your organization—and our democracy.
  2. Games By Angelina -- The aim is to develop an AI system that can intelligently design videogames, as part of an investigation into the ways in which software can design creatively. The creator's GitHub account has some interesting procedural generation projects, too. (via MIT Technology Review)
  3. Every Frame a Painting -- Nearly every stylistic decision you see about the channel—the length of the clips, the number of examples, which studios’ films we chose, the way narration and clip audio weave together, the reordering and flipping of shots, the remixing of 5.1 audio, the rhythm and pacing of the overall video—all of that was reverse engineered from YouTube’s Copyright ID. [...] So, something that was designed to restrict us ended up becoming our style. And yet, there were major problems with all of these decisions. We wouldn’t realize it until years later, but by creating such a simple, approachable style that skirted the edge of legality, we pretty much cut ourselves off from our most ambitious topics.
  4. Love the Sin, Hate the Sinner (Cory Doctorow) -- the best review of Tim's new book that I've seen. [T]he reason tech went toxic was because unethical people made unethical choices, but those choices weren't inevitable or irreversible.

Four short links: 1 December 2017

Creepy Kid Videos, Cache Smearing, Single-Image Learning, and Connected Gift Guide

  1. /r/ElsaGate -- Reddit community devoted to understanding and tackling YouTube's creepy kid videos, from business models to software used to create them.
  2. Cache Smearing (Etsy) -- to solve the problem where one key is so powerful it overloads a single server, a technique for turning a single key into multiple so they can be spread over several servers.
  3. Deep Image Prior -- Deep convolutional networks have become a popular tool for image generation and restoration. Generally, their excellent performance is imputed to their ability to learn realistic image priors from a large number of example images. In this paper, we show that, on the contrary, the structure of a generator network is sufficient to capture a great deal of low-level image statistics prior to any learning. In order to do so, we show that a randomly initialized neural network can be used as a handcrafted prior with excellent results in standard inverse problems such as denoising, superresolution, and inpainting. Furthermore, the same prior can be used to invert deep neural representations to diagnose them, and to restore images based on flash/no flash input pairs.
  4. Privacy Not Included (Mozilla) -- shopping guide for connected gifts, to help you know whether they respect your privacy or not. (most: not so much)

Four short links: 30 November 2017

Object Models, Open Source Voice Recognition, IoT OS, and High-Speed Robot Wars

  1. Object Models -- a (very) brief run through the inner workings of objects in four very dynamic languages. Readable and informative. (via Simon Willison)
  2. Mozilla Releases Open Source Voice Recognition and Voice Data Set -- we have included pre-built packages for Python, NodeJS, and a command-line binary that developers can use right away to experiment with speech recognition. Data set features samples from more than 20,000 people, reflecting a diversity of voices globally.
  3. FreeRTOS -- Amazon adds sync and promises OTA updates. Very clever from Amazon: this is foundational software for IoT.
  4. Japanese Sumo Robots (YouTube) -- omg, the speed of these robots. (via BoingBoing)

Four short links: 29 November 2017

Avoiding State Surveillance, Parallel Algorithms, Smart Tactics, and Voting Security

  1. The Motherboard Guide to Avoiding State Surveillance -- a lot of good advice, even if you're not at risk from a nation state (e.g., don't run your own mail server).
  2. A Library of Parallel Algorithms (CMU) -- what it says on the box. See also CMU's "Algorithm Design: Parallel and Sequential" book.
  3. EFF's Clever Tactic (Cory Doctorow) -- when you argue about DRM, the pro-DRM side always says that all this stuff is an unfortunate side-effect of the law, and that they're really only trying to stop pirates, promise and cross my heart. So, here's what we did at the W3C: we proposed a membership rule that would allow members to use DRM law to sue anyone who infringed their copyrights—but took away their rights to sue people who were breaking DRM for some other reason, like adapting works for people with disabilities, or investigating critical security flaws, or creating legal, innovative new businesses. Needless to say, they didn't go for that proposal, which revealed their true motives.
  4. Cybersecurity of Voting Machines (Matt Blaze) -- his written testimony before Congress. I offer three specific recommendations: (1) Paperless DRE voting machines should be immediately phased out from U.S. elections in favor of systems, such as precinct-counted optical scan ballots, that leave a direct artifact of the voter’s choice. (2) Statistical “risk limiting audits” should be used after every election to detect software failures and attacks. (3) Additional resources, infrastructure, and training should be made available to state and local voting officials to help them more effectively defend their systems against increasingly sophisticated adversaries.

Four short links: 28 November 2017

Code for One, Grid Component, Tinder Data, and Engineering Reorg

  1. Structure -- He wrote Structur. He wrote Alpha. He wrote mini-macros galore. Structur lacked an “e” because, in those days, in the Kedit directory eight letters was the maximum he could use in naming a file. In one form or another, some of these things have come along since, but this was 1984 and the future stopped there. Howard, who died in 2005, was the polar opposite of Bill Gates—in outlook as well as income. Howard thought the computer should be adapted to the individual and not the other way around. One size fits one. The programs he wrote for me were molded like clay to my requirements—an appealing approach to anything called an editor. Personalized software is a wonderful luxury. Programmers forget how rare it is. (via Clive Thompson)
  2. React Data Grid -- open source Excel-like grid component built with React.
  3. What Tinder Knows (Guardian) -- the UK laws that let you request this data are wonderful; without it, we'd have little idea how much of our lives we reveal.
  4. How We Reorganized Instagram’s Engineering Team While Quadrupling Its Size (HBR) -- Once we decided to reorg, the first thing we did was determine our desired outcomes as a team. We gathered our leadership in a room and came up with 20 different outcomes—from speed to cost efficiency—and prioritized them, No. 1 to No. 20. We picked our top five outcomes, which became our organizational principles: Minimize dependencies between teams and code; Have clear accountability with the fewest decision-makers; Groups have clear measures; Top-level organizations have roadmaps; Performance, stability, and code quality have owners.

Four short links: 27 November 2017

PV Growth, Digital Rights, Unit Testing, and Open Source Innovation

  1. Photovoltaic Growth: Reality vs. Projections of the International Energy Agency -- that graph.
  2. Digital Rights in Australia -- three aims: to assess the evolving citizen uses of digital platforms, and associated digital rights and responsibilities in Australia and Asia, identifying key dynamics and issues of voice, participation, marginalization and exclusion; to develop a framework for establishing the rights and legitimate expectations that platform stakeholders—particularly everyday users—should enjoy and the responsibilities they may bear; to identify the best models for governance arrangements for digital platforms and for using these environments as social resources in political, social, and cultural change.
  3. Unit Testing Doesn’t Affect Codebases the Way You Would Think -- nice approach to checking hypotheses like "unit testing results in fewer lines of code per method," with results (it doesn't).
  4. Capabilities for Open Source Innovation (Allison Randal) -- Over the past decade, I’ve been researching open source and technology innovation, partly through employment at multiple different companies that engage in open source, and partly through academic work toward completing a Master’s degree and soon starting a Ph.D. The heart of this research is looking into what makes companies successful at open source and also at technology innovation. It turns out there are actually many things in common between the two.

Four short links: 24 November 2017

Modern Spam, Communist Cybernetics, Computer Simulation, and Retail Big Data

  1. Spam is Back -- “The bulk of our lives online could be spammy,” Brunton said. “Our whole experience could be monetized. We could just get used—forgive my language—to really shitty content all the time.” Some say this has already happened.
  2. Communist Cybernetics -- "Cybernetics provides the theory of social control with precise quantitative methods for analyzing control processes and especially social information, which is a necessary attribute of control." So, the effect was to create a discourse in which cybernetics was emptied of its utopian promise and turned into a system for managing data about governance.
  3. SimH -- The Computer History Simulation Project. Source on GitHub.
  4. Big Data Systems, Labour, Control, and Modern Retail Stores -- It was found that retail work involves a continual movement between a governance regime of control reliant on big data systems that seek to regulate and harnesses formal labour and automation into enterprise planning, and a disciplinary regime that deals with the symbolic, interactive labour that workers perform and acts as a reserve mode of governmentality if control fails. This continual movement is caused by new systems of control being open to vertical and horizontal fissures. While retail functions as a coded assemblage of control, systems are too brittle to sustain the code/space and governmentality desired.

Four short links: 23 November 2017

Fuzzing, Time Series, Unix 1ed, and Failing

  1. The Art of Fuzzing -- demos here.
  2. Clustering of Time Series is Meaningless -- clusters extracted from these time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by any data set, and because of this, the clusters extracted by any clustering algorithm are essentially random. While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has never appeared in the literature.
  3. Run the First Edition of Unix via Docker -- In this article, you'll see how to run a PDP-11 simulator through Docker to interact with Unix as it was back in 1972. (via Simon Willison)
  4. Failing Well -- “What we’re trying to teach is that failure is not a bug of learning; it’s the feature,” said Rachel Simmons, a leadership development specialist in Smith’s Wurtele Center for Work and Life.

Four short links: 22 November 2017

Decision-making, Code Duplication, Container Security, and Information vs Attention

  1. Decision-making Under Stress -- Under acute (short-lived, high-intensity) stress, we focus on short-term rapid responses at the expense of complex thinking. Application to startup culture left as exercise to the reader.
  2. DéjàVu: A Map of Code Duplicates on GitHub -- This paper analyzes a corpus of 4.5 million non-fork projects hosted on GitHub, representing over 482 million files written in Java, C++, Python, and JavaScript. We found that this corpus has a mere 85 million unique files. (via Paper a Day)
  3. NIST Guidance on Application Container Security -- This bulletin offers an overview of application container technology and its most notable security challenges. It starts by explaining basic application container concepts and the typical application container technology architecture, including how that architecture relates to the container life cycle. Next, the article examines how the immutable nature of containers further affects security. The last portion of the article discusses potential countermeasures that may help to improve the security of application container implementations and usage.
  4. Modern Media Is a DoS Attack on Your Free Will -- What’s happened is, really rapidly, we’ve undergone this tectonic shift, this inversion between information and attention. Most of the systems that we have in society—whether it’s news, advertising, even our legal systems—still assume an environment of information scarcity. THIS.

Four short links: 21 November 2017

Storytelling, Decompilation, Face Detection, and Dependency Alerts

  1. Scrollama -- a modern and lightweight JavaScript library for scrollytelling. (via Nathan Yau)
  2. Dangers of the Decompiler -- a sampling of anti-decompilation techniques.
  3. An On-device Deep Neural Network for Face Detection (Apple) -- how the face unlock works, roughly at "technical blog post" levels of complexity.
  4. GitHub Security Alerts -- With your dependency graph enabled, we’ll now notify you when we detect a vulnerability in one of your dependencies and suggest known fixes from the GitHub community.

Four short links: 20 November 2017

Ancient Data, Tech Ethics, Session Replay, and Cache Filesystem​

  1. Trade, Merchants, and the Lost Cities of the Bronze Age -- We analyze a large data set of commercial records produced by Assyrian merchants in the 19th Century BCE. Using the information collected from these records, we estimate a structural gravity model of long-distance trade in the Bronze Age. We use our structural gravity model to locate lost ancient cities. (via WaPo)
  2. Tech Ethics Curriculum -- a Google sheet of tech ethics courses, with pointers to syllabi.
  3. Session Replay Scripts (Ed Felton) -- lately, more and more sites use “session replay” scripts. These scripts record your keystrokes, mouse movements, and scrolling behavior, along with the entire contents of the pages you visit, and send them to third-party servers. Unlike typical analytics services that provide aggregate statistics, these scripts are intended for the recording and playback of individual browsing sessions, as if someone is looking over your shoulder. (via BoingBoing)
  4. RubiX -- Cache File System optimized for columnar formats and object stores.

Four short links: 17 November 2017

Interactive Marginalia, In-Person Interactions, Welcoming Groups, and Systems Challenges

  1. Interactive Marginalia (Liza Daly) -- wonderfully thoughtful piece about web annotations.
  2. In-Person Interactions -- Casual human interaction gives you lots of serendipitous opportunities to figure out that the problem you thought you were solving is not the most important problem, and that you should be thinking about something else. Computers aren't so good at that. So true! (via Daniel Bachhuber)
  3. Pacman Rule -- When standing as a group of people, always leave room for 1 person to join your group. (via Simon Willison)
  4. Berkeley View of Systems Challenges for AI -- In this paper, we propose several open research directions in systems, architectures, and security that can address these challenges and help unlock AI’s potential to improve lives and society.

Four short links: 16 November 2017

Regulate IoT, Visualize CRISPR, Distract Strategically, and Code Together

  1. It's Time to Regulate IoT To Improve Security -- Bruce Schneier puts it nicely: internet security is now becoming "everything" security.
  2. Real-Space and Real-Time Dynamics of CRISPR-Cas9 (Nature) -- great visuals, written up for laypeople in The Atlantic. (via Hacker News)
  3. How the Chinese Government Fabricates Social Media Posts for Strategic Distraction, not Engaged Argument -- research paper. Application to American media left as exercise to the reader.
  4. Coding Together in Real Time with Teletype for Atom -- what it says on the box.

Four short links: 15 November 2017

Paywalled Research, Reproducing AI Research, Spy Teardown, and Peer-to-Peer Misinformation

  1. 65 of the 100 Most-Cited Papers Are Paywalled -- The weighted average of all the paywalls is: $32.33 [...] [T]he open access articles in this list are, on average, cited more than the paywalled ones.
  2. AI Reproducibility -- Participants have been tasked with reproducing papers submitted to the 2018 International Conference on Learning Representations, one of AI’s biggest gatherings. The papers are anonymously published months in advance of the conference. The publishing system allows for comments to be made on those submitted papers, so students and others can add their findings below each paper. [...] Proprietary data and information used by large technology companies in their research, but withheld from papers, is holding the field back.
  3. Inside a Low-Budget Consumer Hardware Espionage Implant -- The S8 data line locator is a GSM listening and location device hidden inside the plug of a standard USB data/charging cable. Has a microphone but no GPS, remotely triggered via SMS messages, uses data to report cell tower location to a dodgy server...and is hidden in a USB cable.
  4. She Warned of ‘Peer-to-Peer Misinformation.’ Congress Listened (NY Times) -- Renee's work on anti-vaccine groups (and her college thesis on propaganda in the 2004 Russian elections) led naturally to her becoming an expert on Russian propaganda in the 2016 elections.