LLM Risk Controls Matrix

01 Input Risks

Risks introduced through user input, prompt construction, or data passed to the model at inference time.

Risk	Severity	Guardrail Controls	Type	NIST AI RMF Ref
Prompt Injection Malicious instructions embedded in user input that override system prompts or hijack model behavior	Critical	Input sanitization and pattern detection layer Strict system prompt separation from user input Adversarial prompt test suite in pre-deployment Output monitoring for instruction-following anomalies	Preventive	MAP 5.1, MANAGE 2.2
Indirect Prompt Injection Malicious instructions embedded in documents, web pages, or external data sources retrieved by the model	Critical	Treat all retrieved content as untrusted input Sanitize external data before inclusion in context Restrict agent tool use to approved, monitored sources Limit agent action scope when processing external content	Preventive	MAP 5.1, GOVERN 6.1
Sensitive Data in Prompts PII, credentials, or confidential data submitted in user prompts and logged or retained by the LLM provider	High	PII detection on input before API transmission User education on data classification in AI tools Data residency and retention review for LLM vendors Prohibited input categories defined in acceptable use policy	Preventive	GOVERN 1.7, MAP 2.3
Jailbreaking Adversarial prompting techniques designed to bypass model safety training and produce prohibited content	High	Output content classifiers on model responses Rate limiting and behavioral anomaly detection Regular red team exercises targeting known jailbreak patterns Escalation path for flagged interactions	Detective	MEASURE 2.5, MANAGE 1.3

02 Output Risks

Risks arising from model-generated content, including factual errors, harmful outputs, and unintended disclosures.

Risk	Severity	Guardrail Controls	Type	NIST AI RMF Ref
Hallucination Model generates plausible but factually incorrect or fabricated information presented with apparent confidence	Critical	Retrieval-augmented generation (RAG) for factual use cases Human review checkpoints for high-stakes outputs Citation requirements and source attribution prompting Confidence scoring where available; low-confidence flagging User-facing disclosure of AI-generated content status	Preventive Corrective	MEASURE 2.1, MANAGE 2.4
Harmful Content Generation Model produces content that is dangerous, illegal, discriminatory, or violates organizational acceptable use standards	Critical	Content safety classifier on all outputs System prompt with explicit prohibited output categories Human review queue for classifier-flagged outputs Incident logging and vendor notification process	Detective	GOVERN 1.2, MAP 5.2
Biased or Discriminatory Output Model outputs that reflect or amplify demographic, racial, gender, or other biases encoded in training data	High	Bias evaluation across demographic dimensions pre-deployment Ongoing output sampling and bias audit schedule Documented escalation process for bias incidents Vendor bias testing documentation required at intake	Detective	MAP 5.1, MEASURE 2.2
Training Data Disclosure Model reproduces verbatim or near-verbatim content from training data including copyrighted material or PII	Medium	Output scanning for known PII patterns Copyright detection in content with verbatim strings Legal review of vendor model cards and training data documentation	Detective	GOVERN 6.2, MAP 2.3

03 Operational and Integration Risks

Risks arising from how the LLM is integrated into systems, workflows, and organizational processes.

Risk	Severity	Guardrail Controls	Type	NIST AI RMF Ref
Over-Reliance / Automation Bias Users accept LLM outputs without critical review, substituting AI judgment for human oversight in consequential decisions	High	Mandatory human-in-the-loop for decisions affecting people UI design that surfaces AI-generated vs human-verified content User training on LLM limitations and appropriate use Defined prohibited use cases where AI must not be the final arbiter	Preventive	GOVERN 1.4, MANAGE 4.1
Model Drift Degradation of model performance over time as real-world data distribution shifts away from training conditions	Medium	Baseline performance metrics established at deployment Scheduled drift monitoring against defined thresholds Automated alerts when output quality metrics degrade Defined re-evaluation or replacement triggers	Detective	MEASURE 1.1, MANAGE 3.2
API and Integration Exposure LLM API access expands organizational attack surface; insecure integrations expose internal data or enable abuse	High	API key rotation and secrets management enforcement Rate limiting and abuse detection on all LLM endpoints Network segmentation for LLM API traffic Logging of all API calls with anomaly alerting	Preventive	GOVERN 6.1, MAP 3.1
Supply Chain Risk Third-party LLM providers, fine-tuning partners, or plugin/tool vendors introduce risks outside organizational control	High	Vendor risk assessment at intake and annual review Review of vendor model cards, safety evaluations, and incident history Contractual provisions for breach notification and data handling Contingency plan for vendor discontinuation or compromise	Preventive	GOVERN 6.2, MAP 3.5

Implementation Note

This matrix is designed as a living document. Controls should be reviewed following any model upgrade, integration change, or security incident. NIST AI RMF references map to the Govern, Map, Measure, and Manage functions and should be cross-referenced with your organization's AI RMF profile.