Mitigation

AML.M0022Generative AI Model Alignment

What it is

When training or fine-tuning a generative AI model it is important to utilize techniques that improve model alignment with safety, security, and content policies. The fine-tuning process can potentially remove built-in safety mechanisms in a generative AI model, but utilizing techniques such as Supervised Fine-Tuning, Reinforcement Learning from Human Feedback or AI Feedback, and Targeted Safety Context Distillation can improve the safety and alignment of the model.

References

  1. https://atlas.mitre.org/mitigations/AML.M0022

Related by meaning· 6

Nearest entities by semantic similarity across the cs-graph corpus.

ATLAS mitigation
Generative AI Guidelines
ATLAS mitigation
Generative AI Guardrails
ATLAS mitigation
Model Hardening
ATLAS mitigation
Validate AI Model
ATLAS mitigation
Memory Hardening
ATLAS mitigation
Sanitize Training Data
Sourced from MITRE ATLAS — Adversarial Threat Landscape for AI Systems. Curated by Adam Lundqvist, SQUR.