Mitigation

AML.M0020Generative AI Guardrails

What it is

Guardrails are safety controls that are placed between a generative AI model and the output shared with the user to prevent undesired inputs and outputs. Guardrails can take the form of validators such as filters, rule-based logic, or regular expressions, as well as AI-based approaches, such as classifiers and utilizing LLMs, or named entity recognition (NER) to evaluate the safety of the prompt or response. Domain specific methods can be employed to reduce risks in a variety of areas such as etiquette, brand damage, jailbreaking, false information, code exploits, SQL injections, and data leakage.

References

  1. https://atlas.mitre.org/mitigations/AML.M0020

Related by meaning· 6

Nearest entities by semantic similarity across the cs-graph corpus.

ATLAS mitigation
Generative AI Guidelines
ATLAS mitigation
Generative AI Model Alignment
ATLAS mitigation
Adversarial Input Detection
ATLAS mitigation
Model Hardening
ATLAS mitigation
Control Access to AI Models and Data in Production
ATLAS mitigation
Validate AI Model
Sourced from MITRE ATLAS — Adversarial Threat Landscape for AI Systems. Curated by Adam Lundqvist, SQUR.