HANALEI.DEV PORTFOLIO / Security Research

AI Model Risk Taxonomy

A structured taxonomy of AI-specific risks including data poisoning, model inversion, hallucination, and misuse, mapped against security controls and governance accountability touchpoints across the AI development lifecycle.

Document TypeRisk Taxonomy
ReferenceOWASP LLM Top 10, NIST AI RMF
AudienceSecurity, Governance, AI Teams
Version1.0

AI Development Lifecycle Reference

Risks are mapped to the lifecycle stage where they originate or are most effectively controlled. Each taxonomy entry below includes lifecycle stage tags for quick reference.

1
Data Collection
Sourcing, labeling, consent
2
Model Training
Architecture, optimization, tuning
3
Evaluation
Testing, red-teaming, benchmarks
4
Deployment
Integration, access, monitoring
5
Operations
Inference, user interaction
6
Decommission
Retirement, data disposal

Training mamp; Data Integrity Risks

Risk Severity Description Security Controls Governance Touchpoint Lifecycle Stage
Data PoisoningMalicious injection of corrupted training data to manipulate model behavior at inference time Critical Attacker introduces manipulated samples into training pipeline, causing model to learn a backdoor trigger or degrade performance on specific inputs. Difficult to detect after training.
Data provenance tracking for all training sourcesInput validation and anomaly detection on training pipelinesDifferential privacy techniques to limit single-sample influenceCanary data injection for detection
Data Governance Lead, MLOps team. Requires sign-off on data sources at intake. NIST AI RMF MAP 2.3. Data CollectionTraining
Training Data BiasSystematic skew in training data that causes model to produce discriminatory or unfair outputs across demographic groups Critical Under-representation or misrepresentation of demographic groups, geographic regions, or time periods in training data produces outputs that disadvantage affected populations at scale.
Demographic parity and disparate impact testingStratified sampling and balanced dataset curationBias evaluation benchmarks before deploymentOngoing output monitoring post-deployment
AI Governance Committee, Legal (EU AI Act Art. 10), HR for employment use cases. NIST AI RMF MEASURE 2.2. Data CollectionEvaluation
Intellectual Property LeakageModel memorizes and reproduces copyrighted or proprietary content from training data High LLMs trained on large corpora can memorize verbatim text from training data, including copyrighted works, PII, or proprietary documents, and reproduce them in outputs.
Training data licensing audit and documentationMembership inference testing during evaluationOutput filtering for verbatim reproduction patternsCopyright detection on outputs in production
Legal review of training data sourcing. Required under EU AI Act Art. 53 for GPAI models. NIST AI RMF GOVERN 6.2. TrainingOperations
Label ManipulationCorruption or adversarial tampering with human-annotated labels used in supervised training High In annotation pipelines relying on crowdsourced or third-party labelers, adversarial or low-quality labels can systematically distort model behavior in targeted ways.
Inter-annotator agreement monitoringAnnotation quality audits and adversarial labeler detectionRedundant labeling for high-stakes categories
Data Governance Lead. Vendor contracts must include labeling quality SLAs. NIST AI RMF MAP 2.3. Data CollectionTraining

Model Extraction mamp; Privacy Attacks

Risk Severity Description Security Controls Governance Touchpoint Lifecycle Stage
Model InversionAttacker reconstructs sensitive training data by querying model outputs Critical By systematically querying a model and analyzing its confidence scores or outputs, an attacker can reconstruct training samples, including private individual records, medical data, or facial images used in training.
Differential privacy in training to limit information leakageOutput confidence score suppression or noise injectionAPI rate limiting and anomaly detection on query patternsMembership inference auditing before deployment
CISO, Data Privacy Officer. Required disclosure under GDPR if training data includes personal data. NIST AI RMF MAP 5.1. TrainingDeploymentOperations
Model ExtractionAttacker reconstructs a functional copy of a proprietary model by querying its API High Through systematic input-output queries, an adversary can train a surrogate model that approximates the behavior of a proprietary model, enabling IP theft and enabling further adversarial attacks without access to the original.
Query rate limiting and anomaly detectionWatermarking model outputs for attributionAPI access controls and authenticationTerms of service prohibiting systematic querying
Legal (IP protection), CISO (API security). NIST AI RMF MANAGE 2.2. DeploymentOperations
Membership InferenceAttacker determines whether a specific record was included in the training dataset High Membership inference attacks exploit overfitting to determine whether a given data point was in the training set. In medical or financial contexts, confirming training set membership reveals sensitive personal information.
Differential privacy and regularization during trainingLimit output precision (e.g., truncate probability scores)Membership inference red-teaming before deployment
Data Privacy Officer, Legal. GDPR right-to-erasure implications if individuals can be confirmed in training data. NIST AI RMF MEASURE 2.5. TrainingEvaluation

Output mamp; Behavioral Risks

Risk Severity Description Security Controls Governance Touchpoint Lifecycle Stage
HallucinationModel generates plausible but factually incorrect or fabricated information with apparent confidence Critical LLMs generate text that is statistically plausible but factually wrong, including fabricated citations, false statistics, and incorrect legal or medical guidance. Risk is highest when outputs are used without human verification in consequential decisions.
Retrieval-augmented generation (RAG) for factual use casesMandatory human review for consequential outputsCitation and source attribution promptingOutput confidence signaling where supported
AI Governance Committee defines prohibited unreviewed uses. Business unit leads own review checkpoints. NIST AI RMF MEASURE 2.1, MANAGE 2.4. Operations
Model DriftDegradation in model performance over time as real-world data distribution shifts High Models trained on historical data may underperform as the world changes. Concept drift (the relationship between inputs and outputs changes) and data drift (input distribution changes) can both silently degrade model reliability.
Baseline performance metrics established at deploymentScheduled drift monitoring against defined thresholdsAutomated alert on performance degradationDefined re-evaluation or retirement triggers
Technical owner monitors performance. AI Governance Committee reviews at defined intervals. NIST AI RMF MEASURE 1.1, MANAGE 3.2. Operations
Harmful Content GenerationModel produces content that is dangerous, illegal, discriminatory, or violates acceptable use standards Critical Without adequate safety training and guardrails, models can generate content that facilitates harm, including instructions for dangerous activities, hate speech, harassment material, or content that violates platform policies.
Content safety classifier on all outputsSystem prompt with prohibited output categoriesHuman review queue for flagged outputsIncident logging and vendor escalation path
AI Governance Committee defines prohibited output categories. CISO owns classifier infrastructure. NIST AI RMF GOVERN 1.2, MAP 5.2. EvaluationDeploymentOperations
Misuse / Dual-UseAI system is used for purposes outside its intended scope, including malicious applications High General-purpose AI systems can be applied to harmful purposes their designers did not intend, including disinformation generation, social engineering assistance, surveillance, or acceleration of cyberattack development.
Acceptable use policy with prohibited use categoriesUse case review at intakeOutput monitoring for misuse pattern detectionTerms enforcement and account suspension capability
Legal defines prohibited use categories. AI Governance Committee reviews high-risk use cases. NIST AI RMF GOVERN 1.2. DeploymentOperations

Adversarial mamp; Input Manipulation Risks

Risk Severity Description Security Controls Governance Touchpoint Lifecycle Stage
Adversarial ExamplesImperceptibly modified inputs that cause model misclassification or unexpected behavior Critical Small, often imperceptible perturbations to input data (images, text, audio) can cause confident misclassification. In safety-critical systems (medical imaging, fraud detection, autonomous systems) this poses direct physical risk.
Adversarial training with augmented examplesInput preprocessing and certified defensesEnsemble methods to increase robustnessAdversarial red-teaming before production deployment
CISO and technical owner responsible for adversarial testing. Required for EU AI Act Art. 15 compliance in high-risk systems. NIST AI RMF MEASURE 2.5. EvaluationDeployment
Prompt InjectionMalicious instructions in user input override system prompts or hijack model behavior Critical User-supplied input containing adversarial instructions can override system-level prompts, causing the model to ignore safety guidelines, reveal confidential information, or act as an unauthorized agent. OWASP LLM Top 10 #1.
Strict system prompt / user input separationInput sanitization and injection pattern detectionPrivilege separation: untrusted input cannot invoke privileged actionsAdversarial prompt test suite in pre-deployment
CISO owns technical controls. AI Governance Committee reviews agentic deployments. NIST AI RMF MAP 5.1, MANAGE 2.2. DeploymentOperations
Indirect Prompt InjectionMalicious instructions embedded in external data sources retrieved and processed by an AI agent Critical When AI agents retrieve and process external content (web pages, documents, emails), adversaries can embed hidden instructions that hijack agent behavior. Particularly dangerous in agentic systems with tool use and API access.
Treat all retrieved content as untrustedSanitize external data before context inclusionRestrict agent tool scope when processing external contentAudit logs of agent actions for anomaly detection
CISO and technical owner. AI Governance Committee reviews all Tier 2+ agentic systems. NIST AI RMF GOVERN 6.1. DeploymentOperations
Evasion AttacksInputs crafted to avoid detection by AI-based security classifiers High When AI is deployed for security purposes (fraud detection, content moderation, malware detection), adversaries craft inputs that exploit model blind spots to evade detection while still achieving malicious objectives.
Adversarial training against known evasion techniquesEnsemble and diverse model classifiersContinuous monitoring for evasion pattern emergenceHuman-in-the-loop for ambiguous classifier outputs
CISO. Evasion risk assessment required before deploying AI in security-critical roles. NIST AI RMF MEASURE 2.5. EvaluationOperations
Usage Note

This taxonomy is designed as a living reference. New risk categories should be added as the threat landscape evolves. Each entry should be reviewed annually and updated following any significant incident, model update, or new research publication affecting the relevant risk category.