Four Short Links

Nat Torkington's eclectic collection of curated links.

Four short links: 23 February 2018

Malicious AI, Due Diligence, Power Laws, and Automated Threats

  1. Preparing for Malicious Uses of AI -- a new paper that urges us to acknowledge AI's dual-use nature; learn from cybersecurity; broaden the discussion.
  2. YC's Series A Diligence Checklist -- a very useful guide.
  3. So You Think You Have a Power Law, Isn't That Special? -- Lots of distributions give you straight-ish lines on a log-log plot. Abusing linear regression makes the baby Gauss cry. Use maximum likelihood to estimate the scaling exponent. Use goodness of fit to estimate where the scaling region begins. Use a goodness-of-fit test to check goodness of fit. Use Vuong's test to check alternatives, and be prepared for disappointment. Ask yourself whether you really care.
  4. OWASP Automated Threat Handbook -- provides actionable information and resources to help defend against automated threats to web applications.

Four short links: 22 February 2018

Fast Style Transfer, Categorizing Social Media Messages, Finding Secrets, and Misleading Images

  1. A Closed-form Solution to Photorealistic Image Stylization -- Experimental results show that the stylized photos generated by our algorithm are twice more preferred by human subjects in average. Moreover, our method runs 60 times faster than the state-of-the-art approach. Code available. (via Ming-Yu Liu)
  2. Characterizing Social Media Messages by How They Propagate -- Since content information is sparse and noisy on social media, adopting TraceMiner allows you to provide a high degree of classication accuracy even in the absence of content information. Experimental results on real-world data sets show the superiority over state-of-the-art approaches on the task of fake news detection and news categorization. (via Paper a Day)
  3. GitLeaks -- searches full repo history for secrets and keys.
  4. Obfuscated Gradients -- In our recent paper, we evaluate the robustness of eight papers accepted to ICLR 2018 as non-certified white-box-secure defenses to adversarial examples. We find that seven of the eight defenses provide a limited increase in robustness and can be broken by improved attack techniques we develop. It's very easy to make an image that looks to a human like one thing, but which a deep learning classifier will identify as something else. (via Dan Kaminsky)

Four short links: 21 February 2018

Fonts for Viz, Map AWS Resources, Technological Unemployment, and Design for Humans

  1. Fonts for Complex Data -- good advice. They even have a section for legal small print! (A non-designer’s first impulse is often to reach for a condensed typeface, on the principle that narrower letters take up less space. Yet, it’s almost always a better option to make the counter-intuitive choice of a wider typeface, and to set the type in a smaller size with tighter leading. Wider letters have more comfortable proportions, they’re more generously spaced, and they have more ample counters, collectively making them the more legible choice.)
  2. CloudMapper -- generates network diagrams of Amazon Web Services (AWS) environments and displays them via your browser. It helps you understand visually what exists in your accounts and identify possible network misconfigurations.
  3. Technological Unemployment -- This is my attempt to figure out what economists and experts think so I can understand the issue, and I’m writing it down to speed your going through the same process. An excellent starting point.
  4. How Technology Is Designed to Bring Out the Worst in Us -- Technology feels disempowering because we haven’t built it around an honest view of human nature.

Four short links: 20 February 2018

House Simulations, XLS diff, Apache FaaS, and Opening Closed Code

  1. House Simulator -- for AI to learn how houses work. Realistic physics, and 120 scenes based on four room categories: kitchens, living rooms, bedrooms, and bathrooms. Written in Unity.
  2. Git xltrail -- meaningful diffs of XLS files in Git repos.
  3. OpenWhisk -- Apache incubating a function-as-a-service (Lambda) package.
  4. How to Open Up Closed Code (GDS) -- Your team may have old closed code that it needs to open. If there is a lot of closed code, this can be challenging. Here are three ways to open it up.

Four short links: 19 February 2018

Disambiguation, Learning to Code, Open Source BI, and API Hierarchy

  1. Discovering Types for Disambiguation -- clever! Clump Wikipedia entries into categories, then use the categories to see which meaning of a word (e.g., Jaguar the car, the animal, or the aircraft) best fits the other words in the sentence.
  2. Learning to Program is Getting Harder (Allen Downey) -- The problem is that GUIs hide a lot of information programmers need to know. So, when a user decides to become a programmer, they are suddenly confronted with all the information that's been hidden from them. If someone just wants to learn to program, they shouldn't have to learn operating system concepts first. (via Slashdot)
  3. Apache Superset -- incubating a modern, enterprise-ready business intelligence web application.
  4. Exploring API Security -- an API ecosphere that is open by default, but actively identifies and minimizes harm, rather than over-complicating security requirements or simply performing a compliance activity. The pyramid diagram will be useful if you ever have to communicate the requirements for an API...

Four short links: 16 February 2018

Machine Design, Metrics, Layered Learning, and Automatically Mergeable Data Structure

  1. Towards Designing Machines -- survey of theory and approaches to building machines that can design things.
  2. Review of the Tyranny of Metrics (Tim Hartford) -- Rather than rely on the informed judgment of people familiar with the situation, we gather meaningless numbers at great cost. We then use them to guide our actions, predictably causing unintended damage.
  3. Physics Travel Guide -- a tool that makes learning physics easier. Each page here contains three layers which contain explanations with increasing level of sophistication. We call these layers: layman, student and researcher. These layers make sure that readers can always find an explanation they understand. One of these for security or coding would be interesting.
  4. Automerge -- A JSON-like data structure that can be modified concurrently by different users, and merged again automatically.

Four short links: 15 February 2018

Donut Drones, Consensus Algorithms, 2FA Spam, and Replacing Founders

  1. Donut Drone (IEEE) -- clever drone that is collision-safe. Nice!
  2. Hitchhiker's Guide to Consensus Algorithms -- In the world of crypto, consensus algorithms exist to prevent double spending. Here’s a quick rundown on some of the most popular consensus algorithms to date, from blockchains to DAGs and everything in-between.
  3. Facebook Spamming Users via Their 2FA Numbers (Mashable) -- when your profits are proportional to engagement, your business model turns your business into a junkie. It will cajole, stalk, berate, and trap users to feed its engagement addiction.
  4. What Happens When Startups Replace The Founder? (HBR) -- about 20% are replaced; noncompete laws help/hinder recruitment; it's overall beneficial; startups perform better when the founder leaves the company; raising external funding raises the probability that the founder will be replaced.

Four short links: 14 February 2018

CS Ethics, Experience the Retail Struggle, Front-End Interview Handbook, and Label Shift

  1. New CS Ethics Courses (NYT) -- Harvard, MIT, Stanford, and UT Austin all offering ethics classes around the challenges that computer scientists and programmers face as they research and develop the future.
  2. American Mall -- Bloomberg's mock-retro game to illustrate the difficulties of keeping American retail malls open. I'm a huge fan of using games to let people experience/simulate a situation.
  3. Front End Interview Handbook -- Answers to front-end interview questions. I can’t begin to imagine the rate of change in this repository.
  4. Detecting and Correcting for Label Shift with Black Box Predictors -- Faced with distribution shift between training and test set, we wish to detect and quantify the shift, and to correct our classifiers without test set labels. Nice. For you discover that your training set underrepresented one of the variables. (Their example is: trained on a data set with .2% pneumonia occurrence but now you learn that pneumonia has 5% prevalence in the population.)

Four short links: 13 February 2018

Machine Learning, CSP Reporting, Remembering Learning, and Viz for Human Rights

  1. Prodigy -- Radically efficient machine teaching. An annotation tool powered by active learning.
  2. Report URI JS -- contenty security policies are awesome, but they are enforced on the browser before your server sees any requests. Use this script to find out what is being blocked by your CSP. (via BoingBoing)
  3. I Wrote Down Everything I Learned While Programming for a Month -- I do this and find it hugely valuable. It's one thing to say "I'm learning all the time" but another to actually be able to point to what you're learning.
  4. Visualizing Data for Human Rights Advocacy -- A guidebook and workshop activity.

Four short links: 12 February 2018

Tech vs. Culture, Fairness and Accountability, People Typeface, and Reproducibility Suite

  1. Containers Will Not Fix Your Broken Culture (Bridget Kromhout) -- words of truth in the tech industry, but "{some tech thing} will not fix your broken culture" is true everywhere (e.g., iPads in schools, chatbots in customer-hating organizations, etc.)
  2. FAT -- proceedings from Conference on Fairness, Accountability, and Transparency in machine learning research.
  3. Wee People -- A typeface of people sillhouettes, to make it easy to build web graphics featuring little people instead of dots. (via Flowing Data)
  4. Stencila -- The office suite for reproducible research. Like a cross between a word processor and a spreadsheet. Almost a Jupyter-style notebook, but WYSIWYG and with a different underlying structure. One to watch!

Four short links: 9 February 2018

Small GUI, Dangerous URLs, Face-Recognition Glasses, and The Future is Hard

  1. Nuklear -- a single-header ANSI C GUI library, with a lot of bindings (Python, Golang, C#, etc.). (via Hacker News)
  2. unfurl -- a tool that analyzes large collections of URLs and estimates their entropies to sift out URLs that might be vulnerable to attack. (via this blog)
  3. Chinese Police Using Face Recognition Glasses -- In China, people must use identity documents for train travel. This rule works to prevent people with excessive debt from using high-speed trains, and limit the movement of religious minorities who have had identity documents confiscated and can wait years to get a valid passport. We asked for glasses that would help us remember people's names, we got Robocop 0.5a/BETA2FINAL. {Obligatory "Black Mirror" reference goes here} (via BoingBoing)
  4. Why I Barely Read SF These Days (Charlie Stross) -- SF should—in my view—be draining the ocean and trying to see at a glance which of the gasping, flopping creatures on the sea bed might be lungfish. But too much SF shrugs at the state of our seas and settles for draining the local aquarium, or even just the bathtub, instead. In pathological cases, it settles for gazing into the depths of a brightly coloured computer-generated fishtank screensaver. Earlier in the essay he talks about how the first to a field defines the tropes and borders that others play in, and it's remarkably hard to find authors who can and will break out of them. (via Matt Jones)

Four short links: 8 February 2018

Data for Problems, Quantum Algorithms, Network Transparency, and AI + Humans

  1. Solving Public Problems With Data -- an introduction to data science and data analytical thinking in the public interest. Online lecture series. Beth Noveck gives one of them. (via The Gov Lab)
  2. Quantum Algorithms: An Overview -- Here we briefly survey some known quantum algorithms, with an emphasis on a broad overview of their applications rather than their technical details. We include a discussion of recent developments and near-term applications of quantum algorithms. (via A Paper A Day)
  3. X11's Network Transparency is Largely a Failure -- Basic X clients that use X properties for everything may be genuinely network transparent, but there are very few of those left these days.
  4. How to Become a Centaur -- When you create a Human+AI team, the hard part isn’t the "AI". It isn’t even the “Human”. It’s the “+”. Interesting history and current state of human and AI systems. (via Tom Stafford)

Four short links: 7 February 2018

Identity Advice, Customer Feedback, Fun Toy, and Reproducibility Resources

  1. 12 Best Practices for User Account, Authorization, and Password Management (Google) -- Your users are not an email address. They're not a phone number. They're not the unique ID provided by an OAUTH response. Your users are the culmination of their unique, personalized data and experience within your service. A well-designed user management system has low coupling and high cohesion between different parts of a user's profile.
  2. Customer Satisfaction at the Push of a Button (New Yorker) -- simply getting binary good/bad feedback is better than no feedback, even if it's not as good as using NPS with something like Thematic. Also an interesting story about the value of physical interactions over purely digital.
  3. XXY Oscilloscope -- try this or this to get started. (via Hacker News)
  4. Reproducibility Workshop -- slides and handouts from a workshop to highlight some of the resources available to help share code, data, reagents, and methods. (via Lenny Teltelman)

Four short links: 6 February 2018

Mine Research, Fight for Attention, AI Metaphors, and Research Browser Extensions

  1. metaDigitise -- Digitising functions in R for extracting data and summary statistics from figures in primary research papers.
  2. Center for Humane Technology -- Silicon Valley tech insiders fighting against attention-vacuuming tech design. (via New York Times)
  3. Tools, Substitutes, or Companions -- three metaphors for how we think about digital and robotic technologies. (via Tom Stafford)
  4. Unpaywall -- browser extension. Click the green tab and skip the paywall on millions of peer-reviewed journal articles. It's fast, free, and legal. Pair with the open access button. (via Swarthmore Libraries)

Four short links: 5 February 2018

Company Principles, DeepFake, AGI, and Missing Devices

  1. Principles of Technology Leadership (Bryan Cantrill) -- (slides) what cultural values and principles do you want to guide *your* company? (via Bryan Cantrill)
  2. Fun With DeepFakes; or How I Got My Wife on The Tonight Show -- this is going to further erode trust. How can you know what happened if all evidence can be convincingly faked? (via Simon Willison)
  3. MIT 6.S099: Artificial General Intelligence -- The lectures will introduce our current understanding of computational intelligence and ways in which strong AI could possibly be achieved, with insights from deep learning, reinforcement learning, computational neuroscience, robotics, cognitive modeling, psychology, and more. Additional topics will include AI safety and ethics. Worth noting that we can't build an artificial general intelligence right now, and may never be able to. Don't freak out because of the course headline.
  4. Catalog of Missing Devices (EFF) -- Things we’d pay money for—things you could earn money with—don’t exist thanks to the chilling effects of an obscure copyright law: Section 1201 of the Digital Millennium Copyright Act (DMCA 1201). From "third-party consumables for 3D printers" to an "ads-free YouTube for Kids," they're good ideas.

Four short links: 2 February 2018

Digitize and Automate, Video Editor, AI + Humans, and Modest JavaScript

  1. Port Automation (Fortune) -- By digitizing and automating activities once handled by human crane operators and cargo haulers, seaports can reduce the amount of time ships sit in port and otherwise boost port productivity by up to 30%. "Digitize and automate" will be the mantra of the next decade.
  2. Shot Cut App -- a free, open source, cross-platform video editor.
  3. The Working Relationship Between Humans and AI (Mike Loukides) -- Whether we're talking about doctors, lawyers, engineers, Go players, or taxi drivers, we shouldn't expect AI systems to give us unchallengeable answers ex silico. We shouldn't be told that we need to "trust AI." What's important is the conversation.
  4. Stimulus-- modest JavaScript framework for the HTML you already have.

Four short links: 1 February 2018

Tor + Bitcoin = De-anonymization, Classic Papers, 3D Holograms, and Big Data Privacy

  1. Deanonymizing Tor Hidden Service Users Through Bitcoin Transactions Analysis -- This, for example, allows an adversary to link a user with @alice Twitter address to a Tor hidden service with private.onion address by finding at least one past transaction in the blockchain that involves their publicly declared Bitcoin addresses.
  2. Great Moments in Computing -- the reading list for this Princeton course is fascinating! (via Paper a Day)
  3. Volumetric 3D Images that Float in the Air (Kurzweil AI) -- the video is impressive! Trap a particle with a laser, move it around really fast while illuminating it with red, green, and blue lights. Result, thanks to persistence of vision: illusion of 3D object. Brilliant!
  4. A Precautionary Approach to Big Data Privacy -- In Section 3, we discuss the levers that policymakers can use to influence data releases: research funding choices that incentivize collaboration between privacy theorists and practitioners, mandated transparency of re-identification risks, and innovation procurement. Meanwhile, practitioners and policymakers have numerous pragmatic options for narrower releases of data. In Section 4, we present advice for six of the most common use cases for sharing data. Our thesis is that the problem of “what to do about re-identification” unravels once we stop looking for a one-size-fits-all solution, and in each of the six cases we propose a solution that is tailored, yet principled.

Four short links: 31 January 2018

Fairness, Typesetting, Anomalies, and Faking Out Speech Recognition

  1. The Problem with Building a Fair System (Mike Loukides) -- We're ultimately after justice, not fairness. And by stopping with fairness, we are shortchanging the people most at risk. If justice is the real issue, what are we missing?
  2. Bookish -- open source tool that translates augmented markdown into HTML or latex.
  3. -- An open source framework for real-time anomaly detection using Python, ElasticSearch, and Kibana. See also the announcement.
  4. Audio Adversarial Examples -- Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (at a rate of up to 50 characters per second). You say "potato," I say "single quote semicolon drop table users semicolon dash dash."

Four short links: 30 January 2018

Podcast Data, Data Stories, Distributed Systems, and Tech Future Scenarios

  1. Podcast Data -- Apple’s Podcast Analytics feature finally became available last month[...]. Though it’s still early days, the numbers podcasters are seeing are highly encouraging. [...] Listeners are typically getting through 80-90% of content. [...] According to Panoply, the few listeners who do skip ads continue to remain engaged with the episode, rather than dropping off at the first sign of an interruption.
  2. The Anatomy of a Data Story -- Great data stories: connect with people; try to convey one idea; keep it simple; explore a topic you know well.
  3. Designing Distributed Systems (Microsoft) -- 160 pages from Microsoft with repeatable, generic patterns, and reusable components to make developing reliable systems easier and more efficient.
  4. Scenario -- How will society change over the next 50 years? Will we still have jobs as we do today, perhaps with slightly shorter working weeks, or will the so-called "technological singularity" lead us to totally restructure our society? Perhaps reality lies somewhere in the middle. We look at three scenarios for the next few decades of technological development. From Scenario Magazine.

Four short links: 29 January 2018

Dangerous Data, Data Linter, Participatory Budgeting, and Security Wargames

  1. Aggregated Data is Dangerous Even When Aggregated -- jogging app releases visualization of all its customers' data, inadvertently exposing military bases. It is dangerous to use data for purposes other than that for which it was collected.
  2. Data Linter -- identifies potential issues (lints) in your ML training data.
  3. Participatory Budgeting -- This research identified significant challenges in the participatory budgeting sphere, from a very common lack of goals to be achieved through participatory budgeting exercises, to very weak network links and peer support for implementers, to the frustrations of the exercises as a result of political corruption or subversion. The migration to managing participatory budgeting digitally presents the very real risk of the process becoming gentrified, and is just one example of the consequences of scale in participatory budgeting only being achieved at the expense of disenfranchising the most under-represented. There are recommendations as well.
  4. Over The Wire -- wargames to help you learn and practice security concepts. (via Hacker News)