Skip to main content

The most significant tension in this issue is between two companies making different decisions about how to handle AI with frontier security capabilities. Anthropic restricted Claude Mythos to a small corporate cohort through Project Glasswing. OpenAI released GPT-5.5 to general availability, and some are calling it “Mythos-like hacking, open to all.” The AI Security Institute’s evaluation confirms the capability is real and consequential. How will you manage risk when the time between discovery of a vulnerability and exploitation collapses to zero?

Another important theme is that, in the words of The Sequence, “AI is becoming operational.” It’s no longer about LLMs that can play games with words. It’s about tools that can automate processes across an enterprise: agents, of course, but more specifically agents that can be shared by teams to produce a consistent set of tools that can be used by groups.

AI Models

The open-weight model market is reshaping the economics of AI. This cycle brought at least 10 significant model releases or updates across open and closed providers, with pricing pressure coming from multiple directions. DeepSeek now performs within a fraction of Claude Opus 4.7 on coding benchmarks at a radically lower price; Alibaba, Google, Z.ai, and Moonshot all released capable open models this cycle. The Stanford AI Index documents this at scale. For organizations building on AI, the question is no longer whether open-weight alternatives are viable but which trade-offs they are willing to make on cost, portability, and support.

  • Google has published a list of 1,302 real-world use cases for generative AI. It’s very long and probably not worth reading on your own. However, you might want to point your agent at it.
  • OpenAI has announced GPT Images 2, its flagship model for generating images. The initial reaction is that it’s slightly better than Google’s Nano Banana. What distinguishes Images 2 is that it “thinks” before generating the image.
  • Anthropic used Claude to work on some problems in alignment research. Claude outperformed the humans at lower cost. The problems were, admittedly, cherry-picked to be easily scoreable. But the experiment also demonstrated that a less capable model can supervise a stronger model.
  • Moonshot Labs has released Kimi K2.6, the latest in its series of open models. It also open sourced the Kimi Vendor Verifier, a tool that tests the accuracy of vendors selling inference using Kimi.
  • Alibaba has released Qwen3.6-35B-A3B, the latest model in its Qwen series. It’s a mixture-of-experts model with 3B active parameters. Simon Willison reports that it draws great flamingos, if you consider that relevant.
  • Anthropic has released Claude Opus 4.7. The model is positioned as an intermediate step between Opus 4.6 and Claude Mythos Preview. Anthropic claims that 4.7 is better at multimodal work, including vision, instruction following, and memory use. Its new tokenizer increases the number of tokens that Claude uses. Because billing is based on tokens, that’s effectively a price increase. Simon Willison has built a tool to compare the token usage of different models.
  • Google has announced Gemini 3.1 Flash TTS, a text-to-speech model that gives extraordinary control over the speakers: accents, style, expression, and more.
  • Stanford’s 2026 AI Index Report is out, with over 400 pages of data and analysis about the state of AI.
  • Meta’s refactored AI lab has released its first model, Muse Spark. It’s a multimodal model that has been designed for integration with Meta’s products. There will eventually be a Contemplating Mode for orchestrating agents.
  • DeepSeek has released a preview version of DeepSeek-V4, its latest open-weight model. It’s a large model (over 1T parameters) with performance very close to the frontier models, but (as Simon Willison points out) running it is very inexpensive.
  • OpenAI released GPT-5.5, which some are calling “Mythos-like hacking, open to all.” In addition to being its “smartest and most intuitive” model yet, OpenAI claims that it reduces token counts, thereby reducing cost. Other sources report that, while it scores highly on benchmarks, GPT-5.5 is markedly more likely to hallucinate and provide incorrect answers.
  • Z.ai’s GLM-5.1 is a new version of the open source GLM-5 model that has been optimized to perform well on long-running tasks.
  • Google has released Gemma 4, a new version of its family of open source models. The family includes a 31B version and a mixture-of-experts version with 26B parameters, 4B active. These are all reasoning models that are designed for agentic workflows. One model, Gemma 4 E4B, can run on the iPhone and Android.

Software Development

Anthropic has clearly been winning the announcement race. Whether it’s also winning on performance is a different question. Claude Code was a favorite among developers until its performance slipped. Many switched to newly released Cursor 3, which puts an agentic interface front and center while relegating the IDE to the background. Anthropic’s public postmortem on Claude Code’s behavior regression is worth reading both for its specific findings and as a model for how AI providers should communicate quality issues to developers. And Cursor’s transformation from an IDE into an agent is a pattern we expect to see repeated across the industry.

  • OpenAI has announced “workspace agents.” Workspace agents can be shared across a team, while the agents we have so far are tied to individual productivity. They enable a team to collaborate on building shared tools to automate workflows.
  • Microsoft has announced two new tools, Critique and Council, that use Claude and GPT together to solve research problems. Their benchmark results show that the combination works better than any model used on its own.
  • Stash is an open source memory layer that agent builders can use to connect their agents to models. We’re beginning to see an agentic stack that is composed of interchangeable modules.
  • Developers have been complaining about a drop in Claude Code’s behavior over the last few months. Anthropic has issued a response explaining what happened and how they’re fixing it.
  • Glif is an agent that tries to unify all the LLMs and tools at your disposal. You don’t have to decide which model or tool is best for each task; it makes the decision for you and gets the task done.
  • OpenAI has decoupled its agent harness from computing and storage, enabling durable long-running agents. The harness is now open source and can be customized through the Agents SDK.
  • Anthropic has announced Claude Code routines. A routine is a package that includes a prompt, a repository, and connectors that will run automatically on Anthropic’s infrastructure, either on a schedule or when triggered.
  • Anthropic also announced Claude Managed Agents, a prebuilt harness for developing agents that run on Anthropic’s infrastructure. The harness provides most of the infrastructure that an agent needs (memory management, etc.) but can be configured for the user’s tasks. Anthropic’s goal appears to be becoming the AWS of agentic AI: a service provider for tool builders.
  • Interoperability between tools, models, and plug-ins is allowing a new programming stack to develop: an orchestration layer, an execution layer, and a review layer.
  • Amazon has launched an agent registry service as part of AWS Bedrock AgentCore. Bedrock AgentCore is a collection of services that make it easy to build and deploy agents on AWS. The registry gives developers a way to discover third-party agents that might be useful to their work.
  • Bryan Cantrill’s essay on laziness is a must-read. AI isn’t lazy, and that’s a problem. When work costs nothing, there’s no need to think about future workers. Laziness is a virtue that we need to preserve.
  • Anthropic has announced Claude Design, a new tool designed to help designers. It competes directly with Figma and Canva. It’s currently in “research preview.”
  • Perplexity has launched Personal Computer, a local AI agent that runs on a dedicated Mac mini (Windows to come) and has persistent access to your files, native apps, inbox, and the web.
  • Anthropic has released a Claude plug-in for Microsoft Word, targeting the legal market. Automated edits appear as tracked changes.
  • LiteParse is a command-line tool that extracts text from PDF files. If you’ve never needed to do that, you’ve lived a blessed life. Simon Willison has built a web-based version that runs LiteParse in the browser.
  • Luke Wroblewski has said that designers should code; they need to understand their medium. But around 2014, heavyweight frameworks like React and Angular got in the way. Coding agents are now making “collapsing the gap between designing and building.”
  • Cursor 3, the letest release of Cursor, relegates its IDE to the background. The main screen is designed for orchestrating agents. You can fall back to the IDE for editing code if you need to.
  • In the first quarter of 2026, Apple’s app store has seen a huge (84%) increase in the number of new apps, compared to the first quarter of 2025. The cause is probably the ease of using AI to create new apps. Apple also appears to be limiting the use of “vibe coding” to create new apps, and has removed several vibe coding apps from the app store.
  • Anthropic accidentally leaked the source code for Claude Code, prompting waves of commentary. Two of the most interesting are Shlok Khemani’s tour of what he found interesting in the source and Gergely Orosz’s discussion of the legal implications.
  • The Hidden Technical Debt of Agentic Engineering” argues that, as with machine learning, agents are relatively small parts of larger software systems, and that technical debt accumulates in all the supporting modules.
  • Chat is rarely the best interface for working with AI. Ethan Mollick writes that the current generation of AI models and agents are capable of creating task-specific interfaces on the fly.

Security

Security has spent a lot of time in the news. Two core tools for secure private networking, Tor and Signal, have been attacked. In both cases, the attack didn’t involve the software or protocols themselves. These attacks teach us that secure systems are often jeopardized by the software that surrounds them. We’ve also seen that ransomware gangs are using postquantum encryption, and that quantum computers are likely to break traditional encryption sooner than expected. If you’re not investing in security, it’s time to start.

  • The Tor network is the gold standard for secure private networking. Researchers recently discovered a vulnerability in Firefox browsers that lets attackers de-anonymize identities. The vulnerability has been fixed in Firefox 150, but it’s a reminder that anything can be attacked.
  • We all know that ransomware gangs use encryption. The Kyber group is making the transition to postquantum encryption.
  • A supply chain attack against npm allows bad actors to steal developers’ credentials. Once it has infected a victim, it inserts itself into other packages that the victim publishes.
  • Law enforcement agencies were briefly able to exploit a vulnerability in iOS notifications that allowed them to access unencrypted messages sent with the Signal secure messaging system. The vulnerability has been patched. It’s important to understand that the vulnerability wasn’t in Signal itself but in the environment in which it operated.
  • With AI, time from discovery of a vulnerability to exploitation has dropped to zero. To help defense catch up, Google has added three agents to its Google Security Operations platform: Threat Hunting, Detection Engineering, and Third Party Context.
  • Microsoft reports that criminals are increasingly using Teams to impersonate help desk personnel, who ask users for their credentials and then steal data.
  • NIST has stopped assigning severity scores to lower-priority vulnerabilities. All vulnerabilities will still be added to the National Vulnerability Database (NVD).
  • The NSA is using Claude Mythos Preview, despite Anthropic being blacklisted by the Pentagon. Anyone want to guess what they’re using it for?
  • Anthropic will ask for identity verification in some cases.
  • Small open-weight models can do as well as Anthropic’s Mythos at finding vulnerabilities. The key isn’t the model; it’s the system within which the model works.
  • A new malware campaign embeds credit-card stealing software into a single pixel SVG image. ecommerce sites using Magento Open Source or Adobe Commerce are vulnerable.
  • Anthropic has pulled its newest model, Claude Mythos, from broader release because it’s too good at finding vulnerabilities in other software. They’ve made it available to a few corporations via Project Glasswing, an attempt to secure critical software before it can be exploited. The AI Security Institute’s analysis of Claude Mythos Preview says that it “represents a step up over previous frontier models in a landscape where cyber performance was already rapidly improving.”
  • Many open source security maintainers agree with Greg Kroah-Hartmann‘s report that the quality of AI-generated security bug reports has gone up tremendously.
  • Versions of Claude Code that include the Vidar malware have been published on GitHub. They are based on the code that Anthropic inadvertently leaked. These versions entice victims to download them by claiming to have unlocked enterprise features.
  • Claude has been used to discover zero-day remote code execution vulnerabilities in both Vim and Emacs. The vulnerabilities are triggered when a user opens a file. An update is available for Vim; Emacs developers argue that it’s really a bug in Git, which may be correct but misses the point.
  • Breakthroughs in quantum computing mean that computers capable of cracking current encryption algorithms may be on the horizon.

Infrastructure and Operations

Multiple providers released overlapping pieces of an agent stack this cycle, covering orchestration, persistence, memory, and registry services. A three-layer model (orchestration, execution, review) is becoming the standard architecture, but each vendor’s implementation makes different bets about portability and durability. It’s important to evaluate each vendor’s products carefully before settling on an agent stack.

  • Microsoft now allows admins to uninstall Copilot, though there are conditions.
  • Google has announced two new eighth-generation TPUs. One is designed for training (8t), the other specializes in inference (8i). This is the first time Google has produced specialized TPUs for training and inference.
  • Google has open-sourced Scion, its testbed for agent orchestration.
  • Anthropic has agreed to buy 3.5 gigawatts of computing power from Google and Broadcom, maker of Google’s GPUs. The deal specifies power consumption rather than the number of chips, implying that the limiting factor isn’t computation but the availability of power. Chips come and go; watts are a constant.
  • Ollama now uses Apple’s MLX framework to improve performance on Apple silicon. Support is currently limited to the Qwen3.5-35B-A3B; support will be added for other models. As part of this update, it also uses NVIDIA’s NVFP4 floating point format for model quantization.

Web

Don’t overlook the web layer when planning for AI-driven disruption. The web’s infrastructure is older than most of the people who maintain it, and several items this cycle are reminders of the gap between what that infrastructure was designed for and how it is used today. Two deal with protocols that have outlasted their original assumptions; another reimagines the dominant CMS from scratch using current tooling.

  • Is PHP the new COBOL? What about open source itself? “Who Will Maintain the Web When PHP’s Veterans Retire?” points to a reality that we don’t like to think about. Not only are companies reluctant to hire junior developers; the ones they do hire aren’t learning older technologies.
  • Laravel is apparently injecting ads for its commercial cloud service into agents. What happens when an open source framework receives venture funding and starts injecting ads into agents? We’re about to find out.
  • Doesn’t every musician need tools to typeset Gregorian chant?
  • Is IPv8 the future of the Internet? IPv6 has been “two years away” since early in the 1990s. IPv8 is fully backward compatible with IPv4, and resolves its security and address depletion issues.
  • Cloudflare has released EmDash, an alternative to WordPress based on how the web is used today. Drew Breunig calls this a reimagining: a new phase of software development in which we can use agentic programming to rethink and reimplement tools based on current needs.
  • Is BGP Safe Yet? is a web app that tests whether your ISP has implemented BGP (the protocol that’s responsible for routing packets at internet scale) correctly. Many haven’t.

Biology

  • OpenAI has announced GPT-Rosalind, a model that has been tuned for 50 common workflows in biology. Unlike most models, Rosalind has been tuned to be skeptical rather than enthusiastic or sycophantic. Access to Rosalind is limited because of the potential for harm.

Robotics

  • Spot, the Boston Robotics robotic dog, can now read gauges and thermometers. It uses the Gemini Robotics-ER 1.6 model, which can reason about visual information.
  • Major League Baseball is using a robotic system to rule on challenges to a human umpire’s ball/strike calls.
Post topics: Radar Trends
Post tags: Commentary