The previous article in this series, “Rules fail at the prompt, succeed at the boundary,” focused on the first AI-orchestrated espionage campaign and the failure of prompt-level control. This article is the prescription. The question every CEO is now getting from their board is some version of: What do we do about agent risk?

Across recent AI security guidance from standards bodies, regulators, and major providers, a simple idea keeps repeating: treat agents like powerful, semi-autonomous users, and enforce rules at the boundaries where they touch identity, tools, data, and outputs.

The following is an actionable eight-step plan one can ask teams to implement and report against:  

Eight controls, three pillars: govern agentic systems at the boundary. Source: Protegrity

Constrain capabilities

These steps help define identity and limit capabilities.

1. Identity and scope: Make agents real users with narrow jobs

Today, agents run under vague, over-privileged service identities. The fix is straightforward: treat each agent as a non-human principal with the same discipline applied to employees.

Every agent should run as the requesting user in the correct tenant, with permissions constrained to that user’s role and geography. Prohibit cross-tenant on-behalf-of shortcuts. Anything high-impact should require explicit human approval with a recorded rationale. That is how Google’s Secure AI Framework (SAIF) and NIST AI’s access-control guidance are meant to be applied in practice.

The CEO question: Can we show, today, a list of our agents and exactly what each is allowed to do?

2. Tooling control: Pin, approve, and bound what agents can use

The Anthropic espionage framework worked because the attackers could wire Claude into a flexible suite of tools (e.g., scanners, exploit frameworks, data parsers) through Model Context Protocol, and those tools weren’t pinned or policy-gated.

The defense is to treat toolchains like a supply chain:

  • Pin versions of remote tool servers.
  • Require approvals for adding new tools, scopes, or data sources.
  • Forbid automatic tool-chaining unless a policy explicitly allows it.

This is exactly what OWASP flags under excessive agency and what it recommends protecting against. Under the EU AI Act, designing for such cyber-resilience and misuse resistance is part of the Article 15 obligation to ensure robustness and cybersecurity.

The CEO question: Who signs off when an agent gains a new tool or a broader scope? How does one know?

3. Permissions by design: Bind tools to tasks, not to models

A common anti-pattern is to give the model a long-lived credential and hope prompts keep it polite. SAIF and NIST argue the opposite: credentials and scopes should be bound to tools and tasks, rotated regularly, and auditable. Agents then request narrowly scoped capabilities through those tools.

In practice, that looks like: “finance-ops-agent may read, but not write, certain ledgers without CFO approval.”

The CEO question: Can we revoke a specific capability from an agent without re-architecting the whole system?

Control data and behavior

These steps gate inputs, outputs, and constrain behavior.

4. Inputs, memory, and RAG: Treat external content as hostile until proven otherwise

Most agent incidents start with sneaky data: a poisoned web page, PDF, email, or repository that smuggles adversarial instructions into the system. OWASP’s prompt-injection cheat sheet and OpenAI’s own guidance both insist on strict separation of system instructions from user content and on treating unvetted retrieval sources as untrusted.

Operationally, gate before anything enters retrieval or long-term memory: new sources are reviewed, tagged, and onboarded; persistent memory is disabled when untrusted context is present; provenance is attached to each chunk.

The CEO question: Can we enumerate every external content source our agents learn from, and who approved them?

5. Output handling and rendering: Nothing executes “just because the model said so”

In the Anthropic case, AI-generated exploit code and credential dumps flowed straight into action. Any output that can cause a side effect needs a validator between the agent and the real world. OWASP’s insecure output handling category is explicit on this point, as are browser security best practices around origin boundaries.

The CEO question: Where, in our architecture, are agent outputs assessed before they run or ship to customers?

6. Data privacy at runtime: Protect the data first, then the model

Protect the data such that there is nothing dangerous to reveal by default. NIST and SAIF both lean toward “secure-by-default” designs where sensitive values are tokenized or masked and only re-hydrated for authorized users and use cases.

In agentic systems, that means policy-controlled detokenization at the output boundary and logging every reveal. If an agent is fully compromised, the blast radius is bounded by what the policy lets it see.

This is where the AI stack intersects not just with the EU AI Act but with GDPR and sector-specific regimes. The EU AI Act expects providers and deployers to manage AI-specific risk; runtime tokenization and policy-gated reveal are strong evidence that one is actively controlling those risks in production.

The CEO question: When our agents touch regulated data, is that protection enforced by architecture or by promises?

Prove governance and resilience

For the final steps, it’s important to show controls work and keep working.

7. Continuous evaluation: Don’t ship a one-time test, ship a test harness

Anthropic’s research about sleeper agents should eliminate all fantasies about single test dreams and show how critical continuous evaluation is. This means instrumenting agents with deep observability, regularly red teaming with adversarial test suites, and backing everything with robust logging and evidence, so failures become both regression tests and enforceable policy updates.

The CEO question: Who works to break our agents every week, and how do their findings change policy?

 8. Governance, inventory, and audit: Keep score in one place

AI security frameworks emphasize inventory and evidence: enterprises must know which models, prompts, tools, datasets, and vector stores they have, who owns them, and what decisions were taken about risk.

For agents, that means a living catalog and unified logs:

  • Which agents exist, on which platforms
  • What scopes, tools, and data each is allowed
  • Every approval, detokenization, and high-impact action, with who approved it and when

The CEO question: If asked how an agent made a specific decision, could we reconstruct the chain?

And don’t forget the system-level threat model: assume the threat actor GTG-1002 is already in your enterprise. To complete enterprise preparedness, zoom out and consider the MITRE ATLAS product, which exists precisely because adversaries attack systems, not models. Anthropic provides a case study of a state-based threat actor (GTG-1002) doing exactly that with an agentic framework.

Taken together, these controls do not make agents magically safe. They do something more familiar and more reliable: they put AI, its access, and actions back inside the same security frame used for any powerful user or system.

For boards and CEOs, the question is no longer “Do we have good AI guardrails?” It’s: Can we answer the CEO questions above with evidence, not assurances?

This content was produced by Protegrity. It was not written by MIT Technology Review’s editorial staff.

Read more

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

Why AI companies are betting on next-gen nuclear

AI is driving unprecedented investment for massive data centers and an energy supply that can support its huge computational appetite. One potential source of electricity for these facilities is next-generation nuclear power plants, which could be cheaper to construct and safer to operate than their predecessors.

We recently held a subscriber-exclusive Roundtables discussion on hyperscale AI data centers and next-gen nuclear—two featured technologies on the MIT Technology Review 10 Breakthrough Technologies of 2026 list. You can watch the conversation back here, and don’t forget to subscribe to make sure you catch future discussions as they happen.

How social media encourages the worst of AI boosterism

Demis Hassabis, CEO of Google DeepMind, summed it up in three words: “This is embarrassing.”

Hassabis was replying on X to an overexcited post by Sébastien Bubeck, a research scientist at the rival firm OpenAI, announcing that two mathematicians had used OpenAI’s latest large language model, GPT-5, to find solutions to 10 unsolved problems in mathematics.

Put your math hats on for a minute, and let’s take a look at what this beef from mid-October was about. It’s a perfect example of what’s wrong with AI right now.

—Will Douglas Heaven

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

The paints, coatings, and chemicals making the world a cooler place

It’s getting harder to beat the heat. During the summer of 2025, heat waves knocked out power grids in North America, Europe, and the Middle East. Global warming means more people need air-­conditioning, which requires more power and strains grids.

But a millennia-old idea (plus 21st-century tech) might offer an answer: radiative cooling. Paints, coatings, and textiles can scatter sunlight and dissipate heat—no additional energy required. Read the full story.

—Becky Ferreira

This story is from the most recent print issue of MIT Technology Review magazine, which shines a light on the exciting innovations happening right now. If you haven’t already, subscribe now to receive future issues once they land.

MIT Technology Review Narrated: China figured out how to sell EVs. Now it has to deal with their aging batteries.

As early electric cars age out, hundreds of thousands of used batteries are flooding the market, fueling a gray recycling economy even as Beijing and big manufacturers scramble to build a more orderly system.

This is our latest story to be turned into a MIT Technology Review Narrated podcast, which we’re publishing each week on Spotify and Apple Podcasts. Just navigate to MIT Technology Review Narrated on either platform, and follow us to get all our new content as it’s released.

The must-reads

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.

1 Europe is edging closer towards banning social media for minors
Spain has become the latest country to consider it. (Bloomberg $)
+ Elon Musk called the Spanish prime minister a “tyrant” in retaliation. (The Guardian)
+ Other European nations considering restrictions include Greece, France and the UK. (Reuters)

2 Humans are infiltrating the social network for AI agents
It turns out role-playing as a bot is surprisingly fun. (Wired $)
+ Some of the most viral posts may actually be human-generated after all. (The Verge)

3 Russian spy spacecraft have intercepted Europe’s key satellites
Security officials are confident Moscow has tapped into unencrypted European comms. (FT $)

4 French authorities raided X’s Paris office
They’re investigating a range of potential charges against the company. (WSJ $)
+ Elon Musk has been summoned to give evidence in April. (Reuters)

5 Jeffrey Epstein invested millions into crypto startup Coinbase
Which suggests he was still able to take advantage of Silicon Valley investment opportunities years after pleading guilty to soliciting sex from an underage girl. (WP $)

6 A group of crypto bros paid $300,000 for a gold statue of Trump
It’s destined to be installed on his Florida golf complex, apparently. (NYT $)

7 OpenAI has appointed a “head of preparedness”
Dylan Scandinaro will earn a cool $555,000 for his troubles. (Bloomberg $)

8 The eternal promise of 3D-printed batteries
Traditional batteries are blocky and bulky. Printing them ourselves could help solve that. (IEEE Spectrum)

9 What snow can teach us about city design
When icy mounds refuse to melt, they show us what a less car-focused city could look like. (New Yorker $)
+ This startup thinks slime mold can help us design better cities. (MIT Technology Review)

10 Please don’t use AI to talk to your friends
That’s what your brain is for. (The Atlantic $)
+ Therapists are secretly using ChatGPT. Clients are triggered. (MIT Technology Review)

Quote of the day

“Today, our children are exposed to a space they were never meant to navigate alone. We will no longer accept that.”

—Spanish prime minister Pedro Sánchez proposes a social media ban for children aged under 16 in the country, following in Australia’s footsteps, AP News reports.

One more thing

A brain implant changed her life. Then it was removed against her will.

Sticking an electrode inside a person’s brain can do more than treat a disease. Take the case of Rita Leggett, an Australian woman whose experimental brain implant designed to help people with epilepsy changed her sense of agency and self.

Leggett told researchers that she “became one” with her device. It helped her to control the unpredictable, violent seizures she routinely experienced, and allowed her to take charge of her own life. So she was devastated when, two years later, she was told she had to remove the implant because the company that made it had gone bust.

The removal of this implant, and others like it, might represent a breach of human rights, ethicists say in a paper published earlier this month. And the issue will only become more pressing as the brain implant market grows in the coming years and more people receive devices like Leggett’s. Read the full story.

—Jessica Hamzelou

We can still have nice things

A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.)

+ Why Beethoven’s Ode to Joy is still such an undisputed banger.
+ Did you know that one of the world’s most famous prisons actually served as a zoo and menagerie for over 600 years?
+ Banana nut muffins sound like a fantastic way to start your day.
+ 2026 is shaping up to be a blockbuster year for horror films.

Read more
1 206 207 208 209 210 3,234