What is Prompt Injection?

Question

Accepted Answer

A security attack in which malicious input causes an AI system to ignore its instructions or take unintended actions, by embedding instructions inside user-provided content. Prompt injection is a class of attack targeting AI systems that use language models to process user input. In a prompt injection attack, an adversary embeds text inside a user-supplied field, document, or message that is designed to override, bypass, or manipulate the model's original instructions. If the model treats the injected text as a legitimate instruction rather than as data, it may produce outputs or take actions that the system's designers did not intend.