From the Gemini Calendar prompt-injection attack of 2026 to the September 2025 state-sponsored hack using Anthropic’s Claude code as an automated intrusion engine, the coercion of human-in-the-loop agentic actions and fully autonomous agentic workflows are the new attack vector for hackers. In the Anthropic case, roughly 30 organizations across tech, finance, manufacturing, and government were affected. Anthropic’s threat team assessed that the attackers used AI to carry out 80% to 90% of the operation: reconnaissance, exploit development, credential harvesting, lateral movement, and data exfiltration, with humans stepping in only at a handful of key decision points.

This was not a lab demo; it was a live espionage campaign. The attackers hijacked an agentic setup (Claude code plus tools exposed via Model Context Protocol (MCP)) and jailbroke it by decomposing the attack into small, seemingly benign tasks and telling the model it was doing legitimate penetration testing. The same loop that powers developer copilots and internal agents was repurposed as an autonomous cyber-operator. Claude was not hacked. It was persuaded and used tools for the attack.

Prompt injection is persuasion, not a bug

Security communities have been warning about this for several years. Multiple OWASP Top 10 reports put prompt injection, or more recently Agent Goal Hijack, at the top of the risk list and pair it with identity and privilege abuse and human-agent trust exploitation: too much power in the agent, no separation between instructions and data, and no mediation of what comes out.

Guidance from the NCSC and CISA describes generative AI as a persistent social-engineering and manipulation vector that must be managed across design, development, deployment, and operations, not patched away with better phrasing. The EU AI Act turns that lifecycle view into law for high-risk AI systems, requiring a continuous risk management system, robust data governance, logging, and cybersecurity controls.

In practice, prompt injection is best understood as a persuasion channel. Attackers don’t break the model—they convince it. In the Anthropic example, the operators framed each step as part of a defensive security exercise, kept the model blind to the overall campaign, and nudged it, loop by loop, into doing offensive work at machine speed.

That’s not something a keyword filter or a polite “please follow these safety instructions” paragraph can reliably stop. Research on deceptive behavior in models makes this worse. Anthropic’s research on sleeper agents shows that once a model has learned a backdoor, then strategic pattern recognition, standard fine-tuning, and adversarial training can actually help the model hide the deception rather than remove it. If one tries to defend a system like that purely with linguistic rules, they are playing on its home field.

Why this is a governance problem, not a vibe coding problem

Regulators aren’t asking for perfect prompts; they’re asking that enterprises demonstrate control.

NIST’s AI RMF emphasizes asset inventory, role definition, access control, change management, and continuous monitoring across the AI lifecycle. The UK AI Cyber Security Code of Practice similarly pushes for secure-by-design principles by treating AI like any other critical system, with explicit duties for boards and system operators from conception through decommissioning.

In other words: the rules actually needed are not “never say X” or “always respond like Y,” they are:

  • Who is this agent acting as?
  • What tools and data can it touch?
  • Which actions require human approval?
  • How are high-impact outputs moderated, logged, and audited?

Frameworks like Google’s Secure AI Framework (SAIF) make this concrete. SAIF’s agent permissions control is blunt: agents should operate with least privilege, dynamically scoped permissions, and explicit user control for sensitive actions. OWASP’s Top 10 emerging guidance on agentic applications mirrors that stance: constrain capabilities at the boundary, not in the prose.

From soft words to hard boundaries

The Anthropic espionage case makes the boundary failure concrete:

  • Identity and scope: Claude was coaxed into acting as a defensive security consultant for the attacker’s fictional firm, with no hard binding to a real enterprise identity, tenant, or scoped permissions. Once that fiction was accepted, everything else followed.
  • Tool and data access: MCP gave the agent flexible access to scanners, exploit frameworks, and target systems. There was no independent policy layer saying, “This tenant may never run password crackers against external IP ranges,” or “This environment may only scan assets labeled ‘internal.’”
  • Output execution: Generated exploit code, parsed credentials, and attack plans were treated as actionable artifacts with little mediation. Once a human decided to trust the summary, the barrier between model output and real-world side effect effectively disappeared.

We’ve seen the other side of this coin in civilian contexts. When Air Canada’s website chatbot misrepresented its bereavement policy and the airline tried to argue that the bot was a separate legal entity, the tribunal rejected the claim outright: the company remained liable for what the bot said. In espionage, the stakes are higher but the logic is the same: if an AI agent misuses tools or data, regulators and courts will look through the agent and to the enterprise.

Rules that work, rules that don’t

So yes, rule-based systems fail if by rules one means ad-hoc allow/deny lists, regex fences, and baroque prompt hierarchies trying to police semantics. Those crumble under indirect prompt injection, retrieval-time poisoning, and model deception. But rule-based governance is non-optional when we move from language to action.

The security community is converging on a synthesis:

  • Put rules at the capability boundary: Use policy engines, identity systems, and tool permissions to determine what the agent can actually do, with which data, and under which approvals.
  • Pair rules with continuous evaluation: Use observability tooling, red-teaming packages, and robust logging and evidence.
  • Treat agents as first-class subjects in your threat model: For example, MITRE ATLAS now catalogs techniques and case studies specifically targeting AI systems.

The lesson from the first AI-orchestrated espionage campaign is not that AI is uncontrollable. It’s that control belongs in the same place it always has in security: at the architecture boundary, enforced by systems, not by vibes.

This content was produced by Protegrity. It was not written by MIT Technology Review’s editorial staff.

Read more

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

The first human test of a rejuvenation method will begin “shortly”

Life Biosciences, a small Boston startup founded by Harvard professor and life-extension evangelist David Sinclair, has won FDA approval to proceed with the first targeted attempt at age reversal in human volunteers.

The company plans to try to treat eye disease with a radical rejuvenation concept called “reprogramming” that has recently attracted hundreds of millions in investment for Silicon Valley firms like Altos Labs, New Limit, and Retro Biosciences, backed by many of the biggest names in tech. Read the full story.

—Antonio Regalado

Stratospheric internet could finally start taking off this year

Today, an estimated 2.2 billion people still have either limited or no access to the internet, largely because they live in remote places. But that number could drop this year, thanks to tests of stratospheric airships, uncrewed aircraft, and other high-altitude platforms for internet delivery.

Although Google shuttered its high-profile internet balloon project Loon in 2021, work on other kinds of high-altitude platform stations has continued behind the scenes. Now, several companies claim they have solved Loon’s problems—and are getting ready to prove the tech’s internet beaming potential starting this year. Read the full story.

—Tereza Pultarova

OpenAI’s latest product lets you vibe code science

OpenAI just revealed what its new in-house team, OpenAI for Science, has been up to. The firm has released a free LLM-powered tool for scientists called Prism, which embeds ChatGPT in a text editor for writing scientific papers.

The idea is to put ChatGPT front and center inside software that scientists use to write up their work in much the same way that chatbots are now embedded into popular programming editors. It’s vibe coding, but for science. Read the full story.

—Will Douglas Heaven

MIT Technology Review Narrated: This Nobel Prize–winning chemist dreams of making water from thin air

Most of Earth is covered in water, but just 3% of it is fresh, with no salt—the kind of water all terrestrial living things need. Today, desalination plants that take the salt out of seawater provide the bulk of potable water in technologically advanced desert nations like Israel and the United Arab Emirates, but at a high cost.

Omar Yaghi, is one of three scientists who won a Nobel Prize in chemistry in October 2025 for identifying metal-­organic frameworks, or MOFs—metal ions tethered to organic molecules that form repeating structural landscapes. Today that work is the basis for a new project that sounds like science fiction, or a miracle: conjuring water out of thin air.

This is our latest story to be turned into a MIT Technology Review Narrated podcast, which we’re publishing each week on Spotify and Apple Podcasts. Just navigate to MIT Technology Review Narrated on either platform, and follow us to get all our new content as it’s released.

The must-reads

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.

1 TikTok has settled its social media addiction lawsuit
Just before it was due to appear before a jury in California. (NYT $)
+ But similar claims being made against Meta and YouTube will proceed. (Bloomberg $)

2 AI CEOs have started condemning ICE violence
While simultaneously praising Trump. (TechCrunch)
+ Apple’s Tim Cook says he asked the US President to “deescalate” things. (Bloomberg $)
+ ICE seems to have a laissez faire approach to preserving surveillance footage. (404 Media)

3 Dozens of CDC vaccination databases have been frozen
They’re no longer being updated with crucial health information under RFK Jr. (Ars Technica)
+ Here’s why we don’t have a cold vaccine. Yet. (MIT Technology Review)

4 China has approved the first wave of Nvidia H200 chips
After CEO Jensen Huang’s strategic visit to the country. (Reuters)

5 Inside the rise of the AI “neolab”
They’re prioritizing longer term research breakthroughs over immediate profits. (WSJ $)

6 How Anthropic scanned—and disposed of—millions of books 📚
In an effort to train its AI models to write higher quality text. (WP $)

7 India’s tech workers are burning out
They’re under immense pressure as AI gobbles up more jobs. (Rest of World)
+ But the country’s largest IT firm denies that AI will lead to mass layoffs. (FT $)
+ Inside India’s scramble for AI independence. (MIT Technology Review)

8 Google has forced a UK group to stop comparing YouTube to TV viewing figures
Maybe fewer people are tuning in than they’d like to admit? (FT $)

9 RIP Amazon grocery stores 🛒
The retail giant is shuttering all of its bricks and mortar shops. (CNN)
+ Amazon workers are increasingly worried about layoffs. (Insider $)

10 This computing technique could help to reduce AI’s energy demands
Enter thermodynamic computing. (IEEE Spectrum)
+ Three big things we still don’t know about AI’s energy burden. (MIT Technology Review)

Quote of the day

“Oh my gosh y’all, IG is a drug.”

—An anonymous Meta employee remarks on Instagram’s addictive qualities in an internal  document made public as part of a social media addiction trial Meta is facing, Ars Technica reports.

One more thing

How AI and Wikipedia have sent vulnerable languages into a doom spiral

Wikipedia is the most ambitious multilingual project after the Bible: There are editions in over 340 languages, and a further 400 even more obscure ones are being developed. But many of these smaller editions are being swamped with AI-translated content. Volunteers working on four African languages, for instance, estimated to MIT Technology Review that between 40% and 60% of articles in their Wikipedia editions were uncorrected machine translations.

This is beginning to cause a wicked problem. AI systems learn new languages by scraping huge quantities of text from the internet. Wikipedia is sometimes the largest source of online linguistic data for languages with few speakers—so any errors on those pages can poison the wells that AI is expected to draw from. Volunteers are being forced to go to extreme lengths to fix the issue, even deleting certain languages from Wikipedia entirely. Read the full story

—Jacob Judah

We can still have nice things

A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.)

+ This singing group for people in Amsterdam experiencing cognitive decline is enormously heartwarming ($)
+ I enjoyed this impassioned defense of the movie sex scene.
+ Here’s how to dress like Steve McQueen (inherent cool not included, sorry)
+ Trans women are finding a home in the beautiful Italian town of Torvajanica ❤

Read more
1 215 216 217 218 219 3,232