A structured matrix of risk categories, guardrail controls, and implementation types for large language model deployments. Designed for use by AI governance, security, and product teams evaluating LLM risk posture.
01 Input Risks
Risks introduced through user input, prompt construction, or data passed to the model at inference time.
| Risk | Severity | Guardrail Controls | Type | NIST AI RMF Ref |
|---|---|---|---|---|
| Prompt Injection Malicious instructions embedded in user input that override system prompts or hijack model behavior | Critical |
Input sanitization and pattern detection layer
Strict system prompt separation from user input
Adversarial prompt test suite in pre-deployment
Output monitoring for instruction-following anomalies
|
Preventive | MAP 5.1, MANAGE 2.2 |
| Indirect Prompt Injection Malicious instructions embedded in documents, web pages, or external data sources retrieved by the model | Critical |
Treat all retrieved content as untrusted input
Sanitize external data before inclusion in context
Restrict agent tool use to approved, monitored sources
Limit agent action scope when processing external content
|
Preventive | MAP 5.1, GOVERN 6.1 |
| Sensitive Data in Prompts PII, credentials, or confidential data submitted in user prompts and logged or retained by the LLM provider | High |
PII detection on input before API transmission
User education on data classification in AI tools
Data residency and retention review for LLM vendors
Prohibited input categories defined in acceptable use policy
|
Preventive | GOVERN 1.7, MAP 2.3 |
| Jailbreaking Adversarial prompting techniques designed to bypass model safety training and produce prohibited content | High |
Output content classifiers on model responses
Rate limiting and behavioral anomaly detection
Regular red team exercises targeting known jailbreak patterns
Escalation path for flagged interactions
|
Detective | MEASURE 2.5, MANAGE 1.3 |
02 Output Risks
Risks arising from model-generated content, including factual errors, harmful outputs, and unintended disclosures.
| Risk | Severity | Guardrail Controls | Type | NIST AI RMF Ref |
|---|---|---|---|---|
| Hallucination Model generates plausible but factually incorrect or fabricated information presented with apparent confidence | Critical |
Retrieval-augmented generation (RAG) for factual use cases
Human review checkpoints for high-stakes outputs
Citation requirements and source attribution prompting
Confidence scoring where available; low-confidence flagging
User-facing disclosure of AI-generated content status
|
Preventive Corrective |
MEASURE 2.1, MANAGE 2.4 |
| Harmful Content Generation Model produces content that is dangerous, illegal, discriminatory, or violates organizational acceptable use standards | Critical |
Content safety classifier on all outputs
System prompt with explicit prohibited output categories
Human review queue for classifier-flagged outputs
Incident logging and vendor notification process
|
Detective | GOVERN 1.2, MAP 5.2 |
| Biased or Discriminatory Output Model outputs that reflect or amplify demographic, racial, gender, or other biases encoded in training data | High |
Bias evaluation across demographic dimensions pre-deployment
Ongoing output sampling and bias audit schedule
Documented escalation process for bias incidents
Vendor bias testing documentation required at intake
|
Detective | MAP 5.1, MEASURE 2.2 |
| Training Data Disclosure Model reproduces verbatim or near-verbatim content from training data including copyrighted material or PII | Medium |
Output scanning for known PII patterns
Copyright detection in content with verbatim strings
Legal review of vendor model cards and training data documentation
|
Detective | GOVERN 6.2, MAP 2.3 |
03 Operational and Integration Risks
Risks arising from how the LLM is integrated into systems, workflows, and organizational processes.
| Risk | Severity | Guardrail Controls | Type | NIST AI RMF Ref |
|---|---|---|---|---|
| Over-Reliance / Automation Bias Users accept LLM outputs without critical review, substituting AI judgment for human oversight in consequential decisions | High |
Mandatory human-in-the-loop for decisions affecting people
UI design that surfaces AI-generated vs human-verified content
User training on LLM limitations and appropriate use
Defined prohibited use cases where AI must not be the final arbiter
|
Preventive | GOVERN 1.4, MANAGE 4.1 |
| Model Drift Degradation of model performance over time as real-world data distribution shifts away from training conditions | Medium |
Baseline performance metrics established at deployment
Scheduled drift monitoring against defined thresholds
Automated alerts when output quality metrics degrade
Defined re-evaluation or replacement triggers
|
Detective | MEASURE 1.1, MANAGE 3.2 |
| API and Integration Exposure LLM API access expands organizational attack surface; insecure integrations expose internal data or enable abuse | High |
API key rotation and secrets management enforcement
Rate limiting and abuse detection on all LLM endpoints
Network segmentation for LLM API traffic
Logging of all API calls with anomaly alerting
|
Preventive | GOVERN 6.1, MAP 3.1 |
| Supply Chain Risk Third-party LLM providers, fine-tuning partners, or plugin/tool vendors introduce risks outside organizational control | High |
Vendor risk assessment at intake and annual review
Review of vendor model cards, safety evaluations, and incident history
Contractual provisions for breach notification and data handling
Contingency plan for vendor discontinuation or compromise
|
Preventive | GOVERN 6.2, MAP 3.5 |
This matrix is designed as a living document. Controls should be reviewed following any model upgrade, integration change, or security incident. NIST AI RMF references map to the Govern, Map, Measure, and Manage functions and should be cross-referenced with your organization's AI RMF profile.