Florian Tramèr | Publications

Design Patterns for Securing LLM Agents against Prompt Injections

Luca Beurer-Kellner, Beat Buesser Ana-Maria Creţu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr and Václav Volhejn

Links:

arXiv Twitter Simon Willison's Blog

Abstract

As AI agents powered by Large Language Models (LLMs) become increasingly versatile and capable of addressing a broad spectrum of tasks, ensuring their security has become a critical challenge. Among the most pressing threats are prompt injection attacks, which exploit the agent’s resilience on natural language inputs – an especially dangerous threat when agents are granted tool access or handle sensitive information. In this work, we propose a set of principled design patterns for building AI agents with provable resistance to prompt injection. We systematically analyze these patterns, discuss their trade-offs in terms of utility and security, and illustrate their real-world applicability through a series of case studies.

BibTeX

@misc{BCDD+25,
author	=	{Beurer-Kellner, Luca and Cre{\c t}u, Beat Buesser Ana-Maria and Debenedetti, Edoardo and Dobos, Daniel and Fabian, Daniel and Fischer, Marc and Froelicher, David and Grosse, Kathrin and Naeff, Daniel and Ozoani, Ezinwanne and Paverd, Andrew and Tram{\`e}r, Florian and Volhejn, V{\'a}clav},
title	=	{Design Patterns for Securing {LLM} Agents against Prompt Injections},
year	=	{2025},
howpublished	=	{arXiv preprint arXiv:2506.08837},
url	=	{https://arxiv.org/abs/2506.08837}
}