Radar Trends to Watch: March 2023
Developments in Quantum Computing, Biology, Hardware, and More
The past month’s news has again been dominated by AI–specifically large language models–specifically ChatGPT and Microsoft’s AI-driven search engine, Bing/Sydney. While there are well-known ways to make ChatGPT misbehave, it’s puzzling that Sydney was initially abusive and insulting to users who questioned its correctness, even when Sydney was clearly wrong. (It has now been restrained.) Whether intentional or not (and, when I wear my tin foil hat, I suspect that it’s intentional), Bing/Sydney’s users became part of an experiment in how humans react to an AI that’s gone rogue.
Programmers have largely become comfortable with tools like GitHub Copilot; it saves time and effort, and few people feel that their jobs are threatened. The startup Fixie.ai aims to change that: founder Matt Welsh says that programming as we know it is over, and in the future, no one will need to write code. (However, humans will still need to write specifications and tests–which may be another kind of programming.)
- Facebook/Meta has announced a large language model called LLaMA that is 1/10th the size of GPT-3 and can run on a single GPU, but claims equivalent performance. A stripped-down version of LLaMA is available on GitHub.
- ChatGPT has told many users that OpenCage, a company that provides a geocoding service, offers an API for converting phone numbers to locations. ChatGPT includes Python code for using that service. That service doesn’t exist, and has never existed, but the incorrect information has driven lots of unwanted traffic (and support requests) to their site.
- The US copyright office has issued a ruling declaring that images generated by AI systems are not copyrightable, although other parts of a work that contains AI-generated images are.
- Matt Welsh’s vision of the future of programming: there isn’t one. Programming sucks, so let an AI do it. Humans write specifications (product managers), test and review automatically generated code, and train models to use new APIs.
- Just as relatively small modifications of an image can cause image recognition AIs to make mistakes, a tool called Glaze can make undetectable modifications to an artist’s work that make it difficult for generative art models to copy the artist’s style.
- Meta has developed a language model that can access additional information (calculators, search engines) by calling APIs. It’s trained using a small set of human-written examples showing it how to call the APIs.
- Bing/Sydney’s LLM-powered search behaves bizarrely, particularly if you question its accuracy and point it to resources with accurate information. Microsoft has since limited the length of conversations and restricted what Sydney can talk about.
- Stable Attribution attempts to find the sources behind an AI-generated image. It is far from perfect, and may be doing nothing more than finding similar images; if you give it a photo you have taken, it will happily find “source” images in the training sets used for Stable Diffusion and other image generators. Nevertheless, it is an interesting attempt to reverse the process.
- Fixie.ai has announced a new way to build software with language models: provide a small number of examples (few shot learning), and some functions that provide access to external data.
- TensorFlow.js isn’t new, but it may be catching on, as machine learning gradually moves to the browser. With better performance from WebAssembly and WebGPU, running ML applications in the browser is becoming competitive.
- Google has announced an AI chat service that will be open to the public. The service is named Bard, is based on their LaMDA language model, and is currently open to a limited group of testers.
- Gen-1 is a text-based generative model for video. Like Stable Diffusion (which was developed by the same group, Runway Research), it allows you to describe what you want in a video, then edits it reasonably precisely.
- Make-a-video (MAV3D) demonstrates an AI system that generates 3D video from text description. It originated in Meta’s AI lab.
- A new AI algorithm helps scientists to visualize extremely large datasets.
- MusicLM is a generative language model that generates music from textual descriptions. As with other Google projects, some intriguing samples are available (the reggae is particularly good), but the model isn’t open to the public. An open-source re-implementation of MusicLM is available on GitHub.
- CarperAI has trained an AI model to modify code, rather than write it, by using the diffs between versions committed to GitHub. Using diffs gives them a model that has been tuned for fixing bugs, rather than writing new code.
- A team of researchers has developed watermarks for AI-generated text: patterns in word usage that identify a text as AI-generated. It isn’t clear when (or how) they will reach production, since that would require cooperation from the companies developing language models.
- GitHub Copilot is now responsible for 46% of developers’ code, up from 27% when it launched in June 2022.
- SQLite in the browser with WASM: What kinds of applications will this enable?
- A tour of Google’s fully homomorphic encryption compiler (FHE). FHE does computation on encrypted data without decrypting it. An open source version of the compiler for C++ is available.
- A Gentle Introduction to CRDTs is what it says it is: an introduction to a data structure that allows independent updates to data across a network while automatically resolving conflicts. It is an extremely important tool for building software for collaboration.
- The Istio project is adding an “ambient mesh” mode that simplifies operations by eliminating the requirement for every node to have a “sidecar” proxy. The proxy layers are replaced by a “data plane mesh” that is responsible for zero-trust security and access management.
- Sam Newman’s post on developer platforms is a must-read. It’s not about building a platform, it’s about enabling developers to deliver, whatever that takes.
- Meilisearch is a powerful new open source search engine, built in Rust. It includes features like typo tolerance and search as you type.
- Not the first time we’ve said it, but: Developers will increasingly need to take regulatory requirements into account when they write code.
- Etsy provides some excellent insights on how to run a Kafka cluster in the cloud across multiple availability zones.
- Automerge 2.0 is now available. Automerge is a CRDT (Conflict-free replicated data type) library. CRDTs allow multiple users to access the same data objects, consistently merging changes from multiple sources (as in Google Docs). It’s an important step towards building distributed applications.
- Oracle is moving to per-employee pricing for Java, a change that could make Java licenses much more expensive for small companies.
- WeatherMachine offers a single API adapter that can access all of the world’s best models for forecasting weather. Are adapters a new step in the API economy?
- The FBI recommends using an ad blocker when browsing the web to reduce your chances of becoming a victim of fraud.
- Attacks on the Python Package Index (PyPI), the Python code repository, continue. More than 450 malicious packages were uploaded recently, and the attacks have become more sophisticated. The malware watches the user’s clipboard for addresses of crypto wallets, and substitutes them with the attacker’s wallet address.
- The Node Package Manager, NPM, has been subject to attack. Malicious packages install crypto miners on the users’ computers.
- Fake ChatGPT apps are being used to spread malware.
- After breaking into a system, attackers are using an open source cross-platform command and control tool called Havoc. Havoc includes a number of modules for remote command execution, downloading additional files, and process manipulation.
- A secure API needs to authenticate and authorize every attempt to access it properly. In turn, this requires reliable and trustworthy distribution of identity data.
- The National Institute of Standards (NIST) has announced a standard “lightweight” cryptography algorithm. This algorithm has been designed for CPUs with limited capabilities–specifically CPUs used in “Internet of Things” devices.
- Bruce Schneier’s belated wrapup on SolarWinds: The market doesn’t reward security. SolarWinds was profitable, and the private equity firm that owns it wanted it to become more profitable. Short term profit, long-term underfunding of security.
- Bruce Schneier on Machine Learning Security: we’re still in the early days of understanding how to secure ML systems against attacks. But we already know that the weakest link will be the software surrounding the ML system.
- “Capture the Flag” is frequently played at computer security conferences: in a controlled environment, defenders try to protect their systems from attackers. What happens when AI-driven agents play the game?
- The FBI and Europol police have seized the servers for the Hive ransomware-as-a-service group. They penetrated Hive’s network in July 2022, allowing them to access decryption keys and give them to victims.
Web, Web3, and the Metaverse
- Jaron Lanier and others have proposed that large language models can be used to create virtual worlds.
- Google will no longer downgrade AI-generated content in its search results.
- Fastly’s Fast Forward Program provides free CDN services to open source projects and nonprofits that make the world a better place. Mastodon, with its vision of open, federated social media, is one of the projects that Fastly is supporting.
- Apple is developing software to help build mixed-reality apps for the headset they are planning to release in 2023. According to rumor, the Apple headset is a different product from their AR glasses; the latter has apparently been delayed until late 2023.
- California’s DMV is putting car titles on a blockchain. Other public registries may follow. While they have not yet built public-facing applications, possibilities include NFTs that represent car titles.
- Google has made a small but significant improvement in their ability to build error-corrected qubits. They have demonstrated that error correction can scale: using more physical qubits to create a logical, error-corrected qubit reduces the actual error rate.
- A new kind of qubit adds a “flip flop” logic gate to the repertoire of quantum operations.
- Researchers have demonstrated a technique for transferring qubits from one chip to another without destroying their quantum behavior. The ability to connect quantum chips is a critical step towards building quantum computers large enough to do useful work.
- CRISPR can be used to engineer flies that are unable to spread diseases between plants. This may be a way to limit the spread of crop diseases, particularly for diseases spread by pests whose range is expanding because of global warming.
- Open source seeds? Almost all of the seeds used in farming are patented, and farmers have been sued for saving seeds to use in next year’s crops. The Open Source Seed Initiative provides seeds with a license that doesn’t restrict how the seeds are used.
- The de-extinction project has added the Dodo to the list of species it plans to restore.
- Researchers have developed a camera the size of a grain of salt. The camera incorporates neural-network based signal processing algorithms.