Competition vs Convenience, Super-Contributors and Power Users, Forecasting Time-Series, and Appreciating Non-Scalability
- Less than Half of Google Searches Now Result in a Click (Sparktoro) — We can see a consistent pattern: organic shrinks while zero-click searches and paid CTR rise. But the devil’s in the details and, in this case, mostly the mobile details, where Google’s gotten more aggressive with how ads and instant answer-type features appear. Everyone has to beware of the self-serving “hey, we’re doing people a favour by taking (some action that results in greater market domination for us)” because there’s a time when the fact that you have meaningful competition is better for the user than a marginal increase in value add from keeping them in your property longer. (via Slashdot)
- Super-Contributors and Power Laws (MySociety) — Overall, two-thirds of users made only one report—but the reports made by this large set of users only makes up 20% of the total number of reports. This means that different questions can lead you to very different conclusions about the service. If you’re interested in the people who are using FixMyStreet, that two-thirds is where most of the action is. If you’re interested in the outcomes of the service, this is mostly due to a much smaller group of people. This dynamic applies pretty much everywhere and is worth understanding.
- Facebook Prophet — a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well. Written in Python and R.
- On Nonscalability: The Living World Is Not Amenable to Precision-Nested Scales — to scale well is to develop the quality called scalability, that is, the ability to expand—and expand, and expand—without rethinking basic elements. […] [B]y its design, scalability allows us to see only uniform blocks, ready for further expansion. This essay recalls attention to the wild diversity of life on earth through the argument that it is time for a theory of nonscalability. (via Robin Sloan)
Content Moderation, Robust Learning, Archiving Floppies, and xkcd Charting
- Information Operations Directed at Hong Kong (Twitter) — Today we are adding archives containing complete tweet and user information for the 936 accounts we’ve disclosed to our archive of information operations—the largest of its kind in the industry. This is a goldmine for researchers, as you can see from Renee DiResta’s notes. Facebook also removed accounts for the same reason but hasn’t shared the data. Google has not taken a position yet, which prompted Alex Stamos to say, “Two of the three relevant companies have made public statements. Neither have realistic prospects in the PRC, the other does. Lots of lessons from this episode, but one might be a reinforcement of how Russia represents ‘easy mode’ for platforms doing state attribution. It’s a lot harder when the actor is financially critical, like the PRC or India.” We’re in interesting times, and research around content moderation are the most interesting things I’ve seen on the Internet since SaaS. This work cuts to human truths, technical capability, and the limits of openness.
- Robust Learning from Untrusted Sources (Morning Paper) — designed to let you incorporate data from multiple “weakly supervised” (i.e., noisy) data sources. Snorkel replaces labels with probability-weighted labels, and then trains the final classifier using those.
- Imaging Floppies (Jason Scott) — recording the magnetic strength everywhere on the disk so you archive all the data not just the data you can read once. The result of this hardware is that it takes a 140 kilobyte floppy disk (140k) and reads it into a 20 megabyte (20,000 kilobyte) disk image. This means a LOT of the magnetic aspects of the floppy are read in for analysis. […] This doesn’t just dupe the data, but the copy protection, unique track setup, and a bunch of variance around each byte on the floppy to make it easier to work with. The software can then do all sorts of analysis to give us excellent, bootable disk images. Don’t ever think that archiving is easy, or problems are solved.
- Chart.xkcd — a chart library plots “sketchy,” “cartoony,” or “hand-drawn” styled charts. The world needs more whimsy.
Developer Tool, Deep Fakes, DNA Tests, and Retro Coding Hacks
- CROKAGE: A New Way to Search Stack Overflow — a paper about a service [that] takes the description of a programming task as a query and then provides relevant, comprehensive programming solutions containing both code snippets and their succinct explanations. There’s a replication package on GitHub. Follows in the footsteps of Douglas Adams’ Electric Monk (which people bought to pay for them) and DVRs (which people use to watch TV for them), now we have software that’ll copy dodgy code from the web for you. Programmers, software is coming for your jobs.
- Cheap Fakes Beat Deep Fakes — One of the fundamental rules of information warfare is that you never lie (except when necessary.) Deepfakes are detectable as artificial content, which reveals the lie. This discredits the source of the information and the rest of their argument. For an information warfare campaign, using deepfakes is a high-risk proposition.
- I Took 9 Different Commercial DNA Tests and Got 6 Different Results — refers to the dubious ancestry measures. “Ancestry itself is a funny thing, in that humans have never been these distinct groups of people,” said Alexander Platt, an expert in population genetics at Temple University in Philadelphia. “So, you can’t really say that somebody is 92.6 percent descended from this group of people when that’s not really a thing.”
- Dirty Tricks 6502 Programmers Use — wonderfully geeky disection of a simple task rendered in as few bytes as possible.
Data Businesses, Data Science Class, Tiny Mouse, and Training Bias
- Making Uncommon Knowledge Common — The Rich Barton Playbook is building Data Content Loops to disintermediate incumbents and dominate Search. And then using this traction to own demand in their industries.
- Data: Past, Present, and Future — Data and data-empowered algorithms now shape our professional, personal, and political realities. This course introduces students both to critical thinking and practice in understanding how we got here, and the future we now are building together as scholars, scientists, and citizens. The way Intro to Data Science classes ought to be.
- Clever Travel Mouse — very small presenter tool, mouse, and pointer.
- Training Bias in “Hate Speech Detector” Means Black Speech More Likely to be Censored (BoingBoing) — The authors do a pretty good job of pinpointing the cause: the people who hand-labeled the training data for the algorithm were themselves biased, and incorrectly, systematically misidentified AAE writing as offensive. And since machine learning models are no better than their training data (though they are often worse!), the bias in the data propagated through the model.
Hardware Deplatforming, Hiring Groupthink, Loot Boxes and Problem Gambling, and Interoperability and Privacy
- Getting Deplatformed from Apple (BoingBoing) — It turned out that getting locked out of his Apple account made all of Luke’s Apple hardware almost useless. I think it should be illegal to do this. I believe in deplatforming (with appropriate boundaries and appeal) but breaking my hardware is bollocks.
- How to Avoid Groupthink When Hiring (HBR) — abridged process: First, make it clear to interviewers that they should not share their interview experiences with each other before the final group huddle. Next, ask each interviewer to perform a few steps before the group huddle: distill their interview rating to a single numerical score; write down their main arguments for and against hiring this person and their final conclusion; If interviewers are emailing in their numerical scores and thoughts on a candidate, don’t include the entire group in the email. Finally, the hiring managers should take note of the average score for a candidate.
- Loot Boxes a Matter of “Life or Death,” says Researcher — “‘There’s one clear message that I want to get across today, and it stands in stark contrast to mostly everything you’ve heard so far,’ Zendle said. ‘The message is this: Spending money on loot boxes is linked to problem gambling. The more money people spend on loot boxes, the more severe their problem gambling is. This isn’t just my research. This is an effect that has been replicated numerous times across the world by multiple independent labs. This is something the games industry does not engage with’.”
- Interoperability and Privacy (BoingBoing) — latest in the tear that Cory Doctorow’s been on about how to deal with the centralised power of BigSocial.
Recognizing Fact, YouTube & Brazil, Programming Zine, and Credit Blacklists
- Younger Americans are Better than Older Americans at Telling Factual News Statements from Opinions (Pew Research) — About a third of 18- to 49-year-olds (32%) correctly identified all five of the factual statements as factual, compared with two-in-ten among those ages 50 and older. A similar pattern emerges for the opinion statements. Among 18- to 49-year-olds, 44% correctly identified all five opinion statements as opinions, compared with 26% among those ages 50 and older. Or, 68% of 18-49 year olds couldn’t tell whether five factual statements were factual? (via @pewjournalism)
- How YouTube Radicalized Brazil (NYT) — He was killing time on the site one day, he recalled, when the platform showed him a video by a right-wing blogger. He watched out of curiosity. It showed him another, and then another. “Before that, I didn’t have an ideological political background,” Mr. Martins said. YouTube’s auto-playing recommendations, he declared, were “my political education.” “It was like that with everyone,” he said.
- Paged Out — a new experimental (one article == one page) free magazine about programming (especially programming tricks!), hacking, security hacking, retro computers, modern computers, electronics, demoscene, and other similar topics.
- Credit Blacklists, Not the Solution to Every Problem — translated Chinese article on blacklists. As the aforementioned source explained, Wulian County is one of the first in Shandong Province to trial the construction of a social credit system, that began last year. The blacklist is a disciplinary measure restricted to persons within the county. It is different from the People’s Bank of China’s credit information evaluation system blacklist, or the blacklist for those deemed to be untrustworthy by the People’s Court. It does not affect the educational opportunities of anyone’s children, whether or not they themselves can ride a train or plane, and so on. Activities such as volunteering, donating blood, charitable contributions, and so on, can add to one’s personal credit (score), and can also be used to restore and upgrade credit ratings, removing themselves from the blacklist. (via ChinAI)
Retro Hacking, Explaining AI, Teacher Ratings, and Algorithmic Bias
- First Person Adventure via Mario Maker (Vice) — the remarkable “3D Maze House (P59-698-55G)” by creator ねぎちん … a level somehow manages to credibly re-create the experience of playing a first-person (!!) adventure game like Wizardy, something Nintendo clearly never intended.
- Measurable Counterfactual Local Explanations for Any Classifier — generates w-counterfactual explanations that state minimum changes necessary to flip a prediction’s classification [and …] builds local regression models, using the w-counterfactuals to measure and improve the fidelity of its regressions. Making AI “explain itself” is useful and hard, this seems like an interesting step forward.
- Student Evaluation of Teaching Ratings and Student Learning are Not Related (Science Direct) — Students do not learn more from professors with higher student evaluation of teaching (SET) ratings. […] New meta-analyses of multisection studies show that SET ratings are unrelated to student learning. (via Sciblogs)
- Apparent Gender-Based Discrimination in the Display of STEM Career Ads — women disproportionately click on job ads, so bidding algorithms charge more to advertisers to show to women, so men see more job ads. (via Ethan Molick)
Shadowban Patent, Abusing Unix Tools, Deblurring Photos, and Postal Vectors
- Facebook Patents Shadowbanning — which has a long history elsewhere.
- Living Off The Land in Linux — legitimate functions of Unix binaries that can be abused to break out restricted shells, escalate or maintain elevated privileges, transfer files, spawn bind and reverse shells, and facilitate the other post-exploitation tasks. Interesting to see the surprising functionality built into some utilities.
- Neural Blind Deconvolution Using Deep Priors — deblurring photos with neural nets. Very cool, and they’ve posted code. (via @roadrunning01)
- Warshipping (TechCrunch) — I mail you a package that contains a Wi-Fi sniffer with cellular connection back to me. It ships me your Wi-Fi handshake, I crack it, ship it back, now it joins your network and the game is afoot. (via BoingBoing)
Counterfeit Security, Poses in Art, Content Moderation, and iPhone Remote Attack Surface
- From The Depths Of Counterfeit Smartphones — security look at the counterfeit phones. Spoiler: they’re nasty, stay away. Both the Galaxy S10 and iPhone 6 counterfeits we assessed contained malware and rootkits. And that’s the most straightforward nastiness: even if you removed the rootkit they’d still be shocking. In the case of the “iPhone,” further digging revealed that it runs a far older version of Android: Kitkat 4.4.0. Kitkat’s last update came in 2014.
- Linking Art through Human Poses — arXiv paper that finds artwork with matching poses using OpenPose. (via MIT TR)
- A Framework for Content Moderation (Ben Thompson) — pretty good post, tackling why and where the different levels of moderation make sense.
- Fully Remote Attack Surface of the iPhone (Google Project Zero) — very interesting read, showing the detail and dead ends of a security tester. The method […] processes incoming MIME messages, and sends them to specific decoders based on the MIME type. Unfortunately, the implementation did this by appending the MIME type string from an incoming message to the string ‘decode’ and calling the resulting method. This meant that an unintended selector could be called, leading to memory corruption.
Checklists, Farewells, De-Risking, and Statistical Complexity of Brain Activity
- Why Checklists Fail (Nature) — After the NHS mandated the WHO checklist, researchers at Imperial College London launched a project to monitor the tool’s use, and found that staff were often not using it as they should. In a review of nearly 7,000 surgical procedures performed at 5 NHS hospitals, they found that the checklist was used in 97% of cases, but was completed only 62% of the time. When the researchers watched a smaller number of procedures in person, they found that practitioners often failed to give the checks their full attention, and read only two-thirds of the items out loud. In slightly more than 40% of cases, at least one team member was absent during the checks; 10% of the time, the lead surgeon was missing. If you give a checklist that ensures X to workers who don’t value X, you get workers who half-arse their way through a checklist. And, in this case, unnecessarily hurt and/or killed patients.
- Rowboats and Magic Feathers: Reflections on 13 Years of Museum 2.0 (Nina Simon) — popular social media productions twist the creators’ perceptions and become burdens. I kept to a rigorous schedule and never took a week off. Even weeks when I was giving birth, on vacation, or exhausted from challenges at work, I blogged. My attitude was, “readers don’t care what’s going on with me. They want the content.” This blog became like Dumbo’s feather. I loved it, but I also let it overpower my sense of self. As long as I was holding it — as long as I was pumping out content — I could soar. But I was terrified to let it drop. Without the blog, I presumed I could not fly. Compare Overly-Attached Girlfriend’s video on leaving YouTube. It’s hard stuff.
- De-Risking Custom Technology Projects (18F) — sweet advice.
- Distinguishing States of Conscious Arousal using Statistical Complexity — how can you tell whether someone is awake or sedated, just from their brain activity? By analysing signals from individual electrodes and disregarding spatial correlations, we find that statistical complexity distinguishes between the two states of conscious arousal through temporal correlations alone. In particular, as the degree of temporal correlations increases, the difference in complexity between the wakeful and anaesthetised states becomes larger. Uses an “epsilon machine,” which I’d not heard of before but which is a “minimal, unifilar presentation of a stationary stochastic process” (particular type of hidden Markov model). The entropy of the epsilon machine’s states yields a measure of statistical complexity, which this paper shows maps to sedated/wake states.
Path Tracing, Games Experiences, Cinematic Visualisation, and IoT Security
- The Path to Traced Movies (Pixar) — Until recently, brute-force path tracing techniques were simply too noisy and slow to be practical for movie production rendering.[…] In this survey, we provide an overview of path tracing and highlight important milestones in its development that have led to it becoming the preferred movie rendering technique today.
- Free to Play? Hate, Harassment, and Positive Social Experiences in Online Games (ADL) — The survey found that 88 percent of adults who play online multiplayer games in the US reported positive social experiences while playing games online. The most common experiences were making friends (51%) and helping other players (50%). […] Seventy-four percent of adults who play online multiplayer games in the US experience some form of harassment while playing games online. Sixty-five percent of players experience some form of severe harassment, including physical threats, stalking, and sustained harassment. Alarmingly, nearly a third of online multiplayer gamers (29%) have been doxed.
- Cinematic Scientific Visualization: The Art of Communicating Science — slides and words from SIGGRAPH talk on advanced film-style techniques for telling science stories.
- Core Cybersecurity Feature Baseline for Securable IoT Devices: A Starting Point for IoT Device Manufacturers (NIST) — draft of some excellent guidelines to device manufacturers. Device identifiers, firmware updates and resets, data protection, disabling and restricting access to local and network interfaces, event logging, etc. Doesn’t specify how to do these things, just that manufacturers should do them. Important so we don’t build more future botfarms.
Innovation Policy Toolkit, Differential Privacy, Ethically-Aligned Design, Low-n Learning
- Toolkit of Policies to Promote Innovation (Journal of Economic Perspectives) — We discuss a number of the main innovation policy levers and describe the available evidence on their effectiveness: tax policies to favor research and development, government research grants, policies aimed at increasing the supply of human capital focused on innovation, intellectual property policies, and pro-competitive policies. In the conclusion, we synthesize this evidence into a single-page “toolkit,” in which we rank policies in terms of the quality and implications of the available evidence and the policies’ overall impact from a social cost-benefit perspective. We also score policies in terms of their speed and likely distributional effects. (via Marginal Revolution)
- A Brief Tour of Differential Privacy — lecture slides from a CMU course. Content warning: Comic Sans.
- Ethically-Aligned Design, 1ed — read online. The most comprehensive, crowd-sourced global treatise regarding the Ethics of Autonomous and Intelligent Systems available today.
- n-Shot Learning — brief overview of machine learning from zero, one, or a handful of examples.
Cognitive Biases, Conflict, Language Models, and Programmable Memristor Computer
- The Evolutionary Roots of Human Decision Making (NCBI) — paper showing that we share cognitive biases with other primates. In one study, monkeys had a choice between one experimenter (the gains experimenter) who started by showing the monkey one piece of apple and sometimes added an extra piece of apple, and a second experimenter (the losses experimenter) who started by showing the monkey two pieces of apple and sometimes removed one. Monkeys showed an overwhelming preference for the gains experimenter over the losses experimenter—even though they received the same payoff from both. In this way, capuchins appear to avoid options that are framed as a loss, just as humans do.
- 6 Must Reads for Cutting Through Conflict and Tough Conversations (First Round Capital) — a summary of good (?) advice from books. Some I agree with, but others … having worked for narcissists and bean counters, find a new job. Don’t stay any longer than you have to with those jerks.
- ERNIE — Baidu’s open source continual pre-training framework for language understanding. Baidu says: Integrating both phrase information and named entity information enables the model to obtain better language representation compared to BERT. ERNIE is trained on multi-source data and knowledge collected from encyclopedia articles, news, and forum dialogues, which improves its performance in context-based knowledge reasoning. See also the ERNIE paper.
- First Programmable Memristor Computer (IEEE) — The new chip combines an array of 5,832 memristors with an OpenRISC processor. 486 specially-designed digital-to-analog converters, 162 analog-to-digital converters, and two mixed-signal interfaces act as translators between the memristors’ analog computations and the main processor.
Software-Defined Analog Circuits, Public Domain, Talk Radio Corpus, and Bad Science
- Software-Defined Analog Circuits — Zrna hardware realizes the analog circuit you specify in software, in real time. Change any circuit parameter on the fly with an API request, at your lab bench or embedded in-application. This is … weird. But cool. Cool and weird.c
- Most Pre-1964 US Books are in the Public Domain — and finally, thanks to the work of librarians and archivists, for anything that’s unambiguously a “book”, we have a parseable record of its pre-1964 interactions with the Copyright Office: the initial registration and any potential renewal. (via Evil Mad Scientist)
- RadioTalk: A Large-Scale Corpus of Talk Radio Transcripts — arxiv paper and github.
- A Rough Guide to Spotting Bad Science — some very useful heuristics. Via this considered evaluation of wild claims.
Provably Correct AI, Porn & Privacy, Math for CS and ML, and Xenophobia Classifier
- ART: Abstraction Refinement-Guided Training for Provably Correct Neural Networks — provably correct neural networks, now there’s an interesting idea …
- Tracking Sex: The Implications of Widespread Sexual Data Leakage and Tracking on Porn Websites — Our analysis of 22,484 pornography websites indicated that 93% leak user data to a third party. Tracking on these sites is highly concentrated by a handful of major companies, which we identify. We successfully extracted privacy policies for 3,856 sites, 17% of the total. The policies were written such that one might need a two-year college education to understand them. Our content analysis of the sample’s domains indicated 44.97% of them expose or suggest a specific gender/sexual identity or interest likely to be linked to the user.
- Algebra, Topology, Differential Calculus, and Optimization Theory For Computer Science and Machine Learning — a 1962-page LaTeX book which some wag listed as Math Basics for CS and ML on Hacker News.
- Open-Source Xenophobia Classifier for Tweets — source is a Colab notebook and they make their labeled training data available too.
Game Translation, Modern Hypercard, Cryptographic Attacks, and Digital Hardware Debugger
- The Near Impossible 20-Year Journey to Translate “Fire Emblem: Thracia 776” (Vice) — an incredible story of translation philosophy, playing out in the context of fan attempts to make an English-language version of a 1999 tactical RPG.
- LiveCode — open-source (GPL) HyperCard-esque app developer, for the modern age. Very nice!
- Cryptographic Attacks: A Guide for the Perplexed (Checkpoint) — various types of cryptographic attacks, with a focus on the attacks’ underlying principles.
- Glasgow — FPGA-based tool for exploring digital interfaces, aimed at embedded developers, reverse engineers, digital archivists, electronics hobbyists, and everyone else who wants to communicate to a wide selection of digital devices with high reliability and minimum hassle. It can be attached to most devices without additional active or passive components, and includes extensive protection from unexpected conditions and operator error.
Email, End-to-End Encryption, AI Ethics, Reliable Distributed Systems
- Notqmail — Collaborative open source successor to qmail.
- The Encryption Debate is Over—Dead at the Hands of Facebook (Forbes) — Facebook’s model entirely bypasses the encryption debate by globalizing the current practice of compromising devices by building those encryption bypasses directly into the communications clients themselves and deploying what amounts to machine-based wiretaps to billions of users at once.
- Why Ethics Cannot be Replaced by the UDHR — Ethics and the UDHR are on the same page, if we keep it general. But questions about what is the right thing to do or what policy is the right one to implement become challenging only when these dearly held values conflict, necessarily involving trade-offs. When we dive deep, the UDHR is simply unable to guide us on those questions. Solving such challenges is the job of ethical reasoning.
- Operating a Large, Distributed System in a Reliable Way: Practices I Learned (Gergely Orosz) — This post is the collection of the practices I’ve found useful to reliably operate a large system at Uber, while working here. Generalizable beyond Uber.
Disinformation, Election Meddling, Quantum Supremacy, and International Pineapple Day
- Disinformation’s Spread: Bots, Trolls, and All of Us (Kate Starbird) — a short and on-the-mark summary of misconceptions about disinformation.
- The Unsexy Threat to Our Election Security (Krebs) — surprisingly low-tech threats (SIM stealing, hijacking a Twitter account) that could bugger up elections.
- Quantum Supremacy is Coming (Quanta) — “supremacy” is marketing hype. Quantum computers will still be useless for a while to come. “Supremacy” refers to conquering errors and noise enough to make a system that can use quantum phenomenon to do in parallel what classical computers must do in serial—even if it’s only on a toy problem.
- How I Started Pineapple Day (Andrew Lee) — “That’s not a real thing,” James retorted with an eyeroll as he set his bag down and sat down at his desk. “Sure it is” I insisted, and to back my claim up I pulled up Google Calendar and added “International Bring Your Pineapple to Work Day” to our shared company calendar. I set the event to repeat every year on June 27th. Have a great weekend!