HANALEI.DEV PORTFOLIO / Risk Framework

LLM Risk Controls Matrix

Document Type Controls Matrix
Focus Guardrails mamp; Mitigations
Framework Alignment NIST AI RMF, OWASP LLM Top 10
Version 1.0

A structured matrix of risk categories, guardrail controls, and implementation types for large language model deployments. Designed for use by AI governance, security, and product teams evaluating LLM risk posture.

Critical Risk
High Risk
Medium Risk
Low Risk
Preventive
Detective
Corrective

01 Input Risks

Risks introduced through user input, prompt construction, or data passed to the model at inference time.

Risk Severity Guardrail Controls Type NIST AI RMF Ref
Prompt Injection Malicious instructions embedded in user input that override system prompts or hijack model behavior Critical
Input sanitization and pattern detection layer Strict system prompt separation from user input Adversarial prompt test suite in pre-deployment Output monitoring for instruction-following anomalies
Preventive MAP 5.1, MANAGE 2.2
Indirect Prompt Injection Malicious instructions embedded in documents, web pages, or external data sources retrieved by the model Critical
Treat all retrieved content as untrusted input Sanitize external data before inclusion in context Restrict agent tool use to approved, monitored sources Limit agent action scope when processing external content
Preventive MAP 5.1, GOVERN 6.1
Sensitive Data in Prompts PII, credentials, or confidential data submitted in user prompts and logged or retained by the LLM provider High
PII detection on input before API transmission User education on data classification in AI tools Data residency and retention review for LLM vendors Prohibited input categories defined in acceptable use policy
Preventive GOVERN 1.7, MAP 2.3
Jailbreaking Adversarial prompting techniques designed to bypass model safety training and produce prohibited content High
Output content classifiers on model responses Rate limiting and behavioral anomaly detection Regular red team exercises targeting known jailbreak patterns Escalation path for flagged interactions
Detective MEASURE 2.5, MANAGE 1.3

02 Output Risks

Risks arising from model-generated content, including factual errors, harmful outputs, and unintended disclosures.

Risk Severity Guardrail Controls Type NIST AI RMF Ref
Hallucination Model generates plausible but factually incorrect or fabricated information presented with apparent confidence Critical
Retrieval-augmented generation (RAG) for factual use cases Human review checkpoints for high-stakes outputs Citation requirements and source attribution prompting Confidence scoring where available; low-confidence flagging User-facing disclosure of AI-generated content status
Preventive

Corrective
MEASURE 2.1, MANAGE 2.4
Harmful Content Generation Model produces content that is dangerous, illegal, discriminatory, or violates organizational acceptable use standards Critical
Content safety classifier on all outputs System prompt with explicit prohibited output categories Human review queue for classifier-flagged outputs Incident logging and vendor notification process
Detective GOVERN 1.2, MAP 5.2
Biased or Discriminatory Output Model outputs that reflect or amplify demographic, racial, gender, or other biases encoded in training data High
Bias evaluation across demographic dimensions pre-deployment Ongoing output sampling and bias audit schedule Documented escalation process for bias incidents Vendor bias testing documentation required at intake
Detective MAP 5.1, MEASURE 2.2
Training Data Disclosure Model reproduces verbatim or near-verbatim content from training data including copyrighted material or PII Medium
Output scanning for known PII patterns Copyright detection in content with verbatim strings Legal review of vendor model cards and training data documentation
Detective GOVERN 6.2, MAP 2.3

03 Operational and Integration Risks

Risks arising from how the LLM is integrated into systems, workflows, and organizational processes.

Risk Severity Guardrail Controls Type NIST AI RMF Ref
Over-Reliance / Automation Bias Users accept LLM outputs without critical review, substituting AI judgment for human oversight in consequential decisions High
Mandatory human-in-the-loop for decisions affecting people UI design that surfaces AI-generated vs human-verified content User training on LLM limitations and appropriate use Defined prohibited use cases where AI must not be the final arbiter
Preventive GOVERN 1.4, MANAGE 4.1
Model Drift Degradation of model performance over time as real-world data distribution shifts away from training conditions Medium
Baseline performance metrics established at deployment Scheduled drift monitoring against defined thresholds Automated alerts when output quality metrics degrade Defined re-evaluation or replacement triggers
Detective MEASURE 1.1, MANAGE 3.2
API and Integration Exposure LLM API access expands organizational attack surface; insecure integrations expose internal data or enable abuse High
API key rotation and secrets management enforcement Rate limiting and abuse detection on all LLM endpoints Network segmentation for LLM API traffic Logging of all API calls with anomaly alerting
Preventive GOVERN 6.1, MAP 3.1
Supply Chain Risk Third-party LLM providers, fine-tuning partners, or plugin/tool vendors introduce risks outside organizational control High
Vendor risk assessment at intake and annual review Review of vendor model cards, safety evaluations, and incident history Contractual provisions for breach notification and data handling Contingency plan for vendor discontinuation or compromise
Preventive GOVERN 6.2, MAP 3.5
Implementation Note

This matrix is designed as a living document. Controls should be reviewed following any model upgrade, integration change, or security incident. NIST AI RMF references map to the Govern, Map, Measure, and Manage functions and should be cross-referenced with your organization's AI RMF profile.