Skip to main content

Coauthored with Claude

Agents are making the transition from performing tasks to running operations. The Cloudflare and Stripe partnership ships an agent that opens accounts, registers domains, and deploys an application on its own (details), while Stripe/Tempo and iWallet have each published machine-to-machine payment protocols to make that kind of work a standard. Office documents, browser sessions, and, in one announcement, the phone interface itself are next on the list. View the expanded role of agents as an opportunity for humans to accomplish more.

AI Models

The model menagerie keeps expanding in size and shape. Open weight contenders run at frontier capability on modest hardware, while specialist models for voice, conversation timing, and privacy filtering take over what used to be features inside one general chat model. Treat your prompts and skills as portable; the model behind them will change.

  • Anthropic has released Opus Claude 4.8. This model is not Mythos, which they expect to release soon. Opus 4.8 is a “modest improvement” that claims better results on coding and greater likelihood of informing users when it is uncertain about claims. Changes to the agents may be more important. Claude Code now has the ability to plan solutions to large problems involving hundreds of subagents (“dynamic workflows”); Cowork can control the effort put into solving a problem.
  • Cohere’s Command A+ is an open weight mixture-of-experts model with 218B parameters, 25B active. It’s competitive with frontier models and requires relatively little hardware to run: Two H100s isn’t small, but it’s not a data center either.
  • Google’s announcements at this year’s I/O conference include Omni, a new model that takes any kind of input (video, audio, image) and generates any kind of output; Gemini 3.5 Flash, a fast and efficient update to their coding model; Gemini Spark, a personal agent; and intelligent eyewear, another attempt at smart glasses.
  • Alibaba has announced Qwen3.7-Max, its most capable model.
  • Thinking Machines has announced a research preview of interaction models. These models support natural conversation flow. The model can wait for a speaker to finish, interrupt the speaker, respond when the speaker interrupts the model, and keep track of time.
  • OpenAI has released new voice models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. They’re moving from call-and-response models to models that can take part in conversations, reason, and take actions.
  • OpenRouter published cost studies for both Claude Opus 4.7 and GPT-5.5. GPT-5.5 raised the token price but reduced the number of tokens in a typical conversation. Claude kept prices the same, but conversations tend to require more tokens. What’s the impact on your monthly bill?
  • Google has updated its Gemma 4 models, claiming that they triple token generation speed. They use a technique called multi-token prediction (MTP) to draft a sequence of tokens with a very small model and then approve those tokens with the large model.
  • IBM released Granite 4.1, a collection of small models (30B parameters and down).
  • An academic paper describes “the reasoning trap,” a phenomenon in which training models for increased reasoning also increases hallucinations about tool use.
  • Talkie is an LLM that was trained only on data from 1931 and earlier. If you want to know what it was like to live during the start of the Depression, this is the LLM to ask.
  • OpenAI has announced a privacy filter model. This is a small specialized model (1.5B) that can run on phones and other small devices. It removes personally identifiable information (PII) from text documents.

Software Development

We are beginning to see anecdotal evidence that the brief era of tokenmaxxing is coming to an end. Agents may increase productivity, but they can also use tokens at an astonishing rate. So can the latest models, like Anthropic’s Claude 4.8 with new features like dynamic workflows. Employers are realizing that the only way to measure productivity is to look at the quality of an employee’s work rather than relying on an artificial (and easily gameable) metric like token use. Agents aren’t just for browser sessions. They’re entering the rest of the workflow, including office documents, browser sessions, and legal work—without the hallucinations, we hope. At the bottom of the stack, though, the reality is that writing code by hand is still the way to understand what an agent is doing for you. Teams that use AI effectively will be disciplined about token use; they’ll choose lower cost (or local) models where possible, reaching for expensive models like Claude 4.8 Opus only when necessary.

  • The Agentic AI Foundation is updating the MCP protocol, with a release candidate scheduled for July 28. Changes include making MCP a stateless protocol, adding a process for creating extensions, and aligning authorization with the OAuth and OpenID standards.
  • Google is dropping Gemini CLI and putting all of its effort behind Antigravity, its agentic software development platform. There are desktop and command line versions of Antigravity, but unlike Gemini CLI, neither are open source.
  • What shall we call Gas City, created by Julian Knutsen and Chris Sells? Gas Town 2.0? Steve Yegge says it’s an SDK for building your own “dark factories” by deploying teams of collaborating agents in any topology. It’s “a pivotal moment in the Mad Max school of agent orchestration.”
  • The problem with agentic programming is that agents serve individuals, not groups, and programming is a team sport. Is collaborative steering (context management for groups) an answer?
  • GitHub has released a preview of its Copilot app, a stand-alone desktop application for coding with AI. It’s completely integrated with GitHub; for example, you can launch tasks directly from GitHub issues.
  • If you think tokenmaxxing is your path to promotion, check out burn-baby-burn. It does what it says: burns lots of tokens, fast, using the LLM of your choice. We hope it’s a parody, but we bet it works.
  • Mitchell Hashimoto tweets that Anthropic’s rewrite of Bun from Zig to Rust demonstrates that programming languages are now fungible. Programming language lock-in has ended; programs can easily move from one language to another.
  • OpenShell is a runtime environment built with security in mind from the ground up. It’s intended to be used as a secure environment for running agents. Every agent runs in its own sandbox; an external gateway manages credentials and policies.
  • OpenAI is shutting down its API for fine-tuning its models. They say the current models are better and don’t require significant fine-tuning. As Latent Space points out, this doesn’t necessarily mean the end of fine-tuning as a discipline, particularly for open models. But it may be a signal. Drew Breunig writes about what this means for agents and harnesses.
  • Anthropic has released Claude for Office 365, allowing users to run sessions that cross Word, Excel, and PowerPoint. Integration with Outlook is coming, though Claude for Outlook is currently a separate product.
  • A plugin to Chrome allows Codex to use Chrome for browser tasks that require you to be logged in—for example, reading email.
  • Firecrawl is an API that agents can use to interact with websites in a human way. It enables agents to search for the latest data, interact with the site, and return the results at scale.
  • Drew Breunig’s “10 Lessons for Agentic Coding” is an invaluable list of tips, including “Implement to learn.” Letting an agent write all the code is easy, but when you really need to learn something, write it by hand first.
  • Deepclaude configures Claude’s autonomous agent loop to use DeepSeek V4 Pro rather than one of Anthropic’s models. It’s a good way to save (DeepSeek costs much less per token) and experiment with open models. (Fair warning: The name deepclaude may change.)
  • OpenAI has announced Codex for Work, an assistant that’s designed for office work rather than software development.
  • Kanwas is a new tool for sharing context across agents. It can be used by workgroups to collaborate on projects.
  • Mike is an open source AI trained for legal work and designed to run locally.
  • GitHub is transitioning to usage-based billing for Copilot.
  • OpenAI and Qualcomm are reportedly working on a phone where the user interface is an agent. There won’t be any apps; the agent will do everything.

Infrastructure and Operations

The infrastructure questions of the moment are whether agents can transact and deploy without humans, and whether the platforms that host open source can stay reliable enough to keep that work going. Watch for GitHub alternatives to become competitive. And watch AI Together, a cloud company that hosts hundreds of open source models.

  • TokenTuner helps control AI costs by identifying where companies can use lower-cost models productively. It attempts to match token usage to business outcomes, and evaluates individuals and teams on how effectively they use their token budget.
  • In partnership with Stripe, Cloudflare now has an agent that can create a new account, start a subscription, register a domain name with DNS, and deploy an application without human intervention aside from granting permission.
  • Stripe and Tempo have released the Machine Payments Protocol (MPP), and iWallet has laid out a roadmap for the Autonomous Settlement Protocol (ASP). These new protocols are designed to facilitate machine-to-machine transactions, transactions that have to be designed without a human in the loop.
  • The Inference Era is when inference, rather than training, drives AI usage, cost, and infrastructure. GPUs remain important, but the relative demand for CPUs increases.
  • GitHub is in danger of losing its place at the center of the open source ecosystem. Problems with uptime are causing projects to find homes elsewhere—most recently, Ghostty.
  • Together AI operates a cloud AI platform that’s designed specifically for inference rather than training and that provides API access to over 200 open weight models. As AI use increases, the ability to run models and provide answers efficiently becomes more important than the ability to train new models.

Security

The patch window is shrinking to zero, and the attacker’s toolkit and the defender’s toolkit now include the same AI models. Any vulnerability disclosed today is being exploited tonight. The good news is that defenders running these tools at scale can close gaps faster than ever; the bad news is that the race never ends.

  • FROST is a new technology for surreptitiously discovering what websites a user is visiting. It’s based on measuring the I/O operations on the user’s SSD. FROST requires no interaction from the user and runs entirely in the browser.
  • Regrettably, neither arcane prompt injection attacks nor cryptocurrency scams are news. But it warms a ham radio enthusiast’s heart to see Morse code used in a prompt injection to scam a crypto trading bot.
  • TeamPCP, a cybercriminal collective, has attacked GitHub by installing a poisoned extension to VS Code. GitHub announced that nearly 4,000 repositories have been compromised, all belonging to GitHub itself; no customer repositories have become victims. But anyone who installs corrupted code from GitHub’s own repositories is vulnerable.
  • No Security Meter for AI provides an excellent look into the state of AI security.
  • Cloudflare’s report on Project Glasswing and Claude Mythos is worth reading. Mythos is especially noteworthy for its ability to chain vulnerabilities. In real life, few vulnerabilities are exploitable on their own; they become vulnerable when they are used in combination with others.
  • Daniel Stenberg reports that Mythos found five potential vulnerabilities in curl, of which one was legitimate. The low count isn’t surprising, given the quality of the curl team’s work. What’s significant is that Mythos was able to find a legitimate vulnerability in software that had been thoroughly audited by humans, traditional tools, and AI.
  • Who showed up? A security researcher ran a honeypot with port 22 open for 54 days, and logged every attempt to log in: 269,000 connection attempts from 7,556 unique IP addresses.
  • GitHub’s dependency scanning service for its MCP server is now in public preview. It checks code changes for vulnerable dependencies before committing code or opening a pull request.
  • Copy.fail is a recently discovered Linux kernel vulnerability that allows unprivileged processes to escalate privileges, and it was exploited within a day of its release. Unlike most vulnerabilities, running infected programs in a container does not offer protection. The time from release of a zero-day to exploitation in the wild is indeed shrinking.
  • OpenAI’s Advanced Account Security requires a physical key or passkey for access; there are no passwords. Hardware keys are provided by Yubico or a compatible hardware token.
  • GPT-5.5 Cyber is a version of GPT-5.5 that has been trained as a security tool. As Anthropic did with Mythos, OpenAI is limiting access to a small group of trusted users.
  • The Firefox team has used Claude Mythos to find 271 previously unknown vulnerabilities in Firefox. While this finding is terrifying, they conclude that defenders now have the advantage. Once you know the vulnerabilities, it’s possible to close the gap between defenders and attackers.
  • Claude Code can leak credentials and other secrets to public repos and package registries. When you select “allow always” for a specific command, the command and its credentials are stored in a subdirectory of .claude. This directory can inadvertently be incorporated into a package.

Policy and Governance

  • The ArXiv preprint repository has clarified its code of conduct for AI users. Submitters are responsible for their papers and will be banned for a year if they submit papers that use AI-generated content inappropriately. This includes hallucinated content, references, and plagiarism.
  • Look to China for new approaches to data governance. China is treating data as a national resource and building the infrastructure for a data economy.

Web

  • At its I/O conference, Google announced that traditional search will be replaced by AI search, powered by Gemini 3.5 Flash. Both AI search and traditional search (which is really AI-powered) have proven useful. What happens when you eliminate one of the options?
  • Linux running in a PDF? The PDF format supports JavaScript, and C can be compiled to JavaScript.

Biology

  • Colossal Biosciences has developed a 3D-printed artificial eggshell that’s capable of raising chicks from embryos.
  • Brazil has invested heavily in vaccines and has created a single-shot vaccine against Dengue fever. The country is striving for “medical sovereignty,” a concept that’s clearly related to data sovereignty and AI sovereignty.
Post topics: Radar Trends