Techniquedefense-evasionATLAS

AML.T0068LLM Prompt Obfuscation

What it is

Adversaries may hide or otherwise obfuscate prompt injections or retrieval content to avoid detection from humans, large language model (LLM) guardrails, or other detection mechanisms. For text inputs, this may include modifying how the instructions are rendered such as small text, text colored the same as the background, or hidden HTML elements. For multi-modal inputs, malicious instructions could be hidden in the data itself (e.g. in the pixels of an image) or in file metadata (e.g. EXIF for images, ID3 tags for audio, or document metadata). Inputs can also be obscured via an encoding scheme such as base64 or rot13. This may bypass LLM guardrails that identify malicious content and may not be as easily identifiable as malicious to a human in the loop.

References

  1. https://atlas.mitre.org/techniques/AML.T0068

Related by meaning· 6

Nearest entities by semantic similarity across the cs-graph corpus.

ATLAS
LLM Prompt Crafting
ATLAS
LLM Prompt Injection
ATLAS
LLM Trusted Output Components Manipulation
ATLAS
LLM Data Leakage
ATLAS
LLM Response Rendering
ATLAS
Extract LLM System Prompt
Sourced from MITRE ATLAS — Adversarial Threat Landscape for AI Systems. Curated by Adam Lundqvist, SQUR.