Structure
AI Development Lifecycle Reference
Risks are mapped to the lifecycle stage where they originate or are most effectively controlled. Each taxonomy entry below includes lifecycle stage tags for quick reference.
1
Data Collection
Sourcing, labeling, consent
2
Model Training
Architecture, optimization, tuning
3
Evaluation
Testing, red-teaming, benchmarks
4
Deployment
Integration, access, monitoring
5
Operations
Inference, user interaction
6
Decommission
Retirement, data disposal
Category A
Training mamp; Data Integrity Risks
| Risk | Severity | Description | Security Controls | Governance Touchpoint | Lifecycle Stage |
|---|---|---|---|---|---|
| Data PoisoningMalicious injection of corrupted training data to manipulate model behavior at inference time | Critical | Attacker introduces manipulated samples into training pipeline, causing model to learn a backdoor trigger or degrade performance on specific inputs. Difficult to detect after training. | Data provenance tracking for all training sourcesInput validation and anomaly detection on training pipelinesDifferential privacy techniques to limit single-sample influenceCanary data injection for detection |
Data Governance Lead, MLOps team. Requires sign-off on data sources at intake. NIST AI RMF MAP 2.3. | Data CollectionTraining |
| Training Data BiasSystematic skew in training data that causes model to produce discriminatory or unfair outputs across demographic groups | Critical | Under-representation or misrepresentation of demographic groups, geographic regions, or time periods in training data produces outputs that disadvantage affected populations at scale. | Demographic parity and disparate impact testingStratified sampling and balanced dataset curationBias evaluation benchmarks before deploymentOngoing output monitoring post-deployment |
AI Governance Committee, Legal (EU AI Act Art. 10), HR for employment use cases. NIST AI RMF MEASURE 2.2. | Data CollectionEvaluation |
| Intellectual Property LeakageModel memorizes and reproduces copyrighted or proprietary content from training data | High | LLMs trained on large corpora can memorize verbatim text from training data, including copyrighted works, PII, or proprietary documents, and reproduce them in outputs. | Training data licensing audit and documentationMembership inference testing during evaluationOutput filtering for verbatim reproduction patternsCopyright detection on outputs in production |
Legal review of training data sourcing. Required under EU AI Act Art. 53 for GPAI models. NIST AI RMF GOVERN 6.2. | TrainingOperations |
| Label ManipulationCorruption or adversarial tampering with human-annotated labels used in supervised training | High | In annotation pipelines relying on crowdsourced or third-party labelers, adversarial or low-quality labels can systematically distort model behavior in targeted ways. | Inter-annotator agreement monitoringAnnotation quality audits and adversarial labeler detectionRedundant labeling for high-stakes categories |
Data Governance Lead. Vendor contracts must include labeling quality SLAs. NIST AI RMF MAP 2.3. | Data CollectionTraining |
Category B
Model Extraction mamp; Privacy Attacks
| Risk | Severity | Description | Security Controls | Governance Touchpoint | Lifecycle Stage |
|---|---|---|---|---|---|
| Model InversionAttacker reconstructs sensitive training data by querying model outputs | Critical | By systematically querying a model and analyzing its confidence scores or outputs, an attacker can reconstruct training samples, including private individual records, medical data, or facial images used in training. | Differential privacy in training to limit information leakageOutput confidence score suppression or noise injectionAPI rate limiting and anomaly detection on query patternsMembership inference auditing before deployment |
CISO, Data Privacy Officer. Required disclosure under GDPR if training data includes personal data. NIST AI RMF MAP 5.1. | TrainingDeploymentOperations |
| Model ExtractionAttacker reconstructs a functional copy of a proprietary model by querying its API | High | Through systematic input-output queries, an adversary can train a surrogate model that approximates the behavior of a proprietary model, enabling IP theft and enabling further adversarial attacks without access to the original. | Query rate limiting and anomaly detectionWatermarking model outputs for attributionAPI access controls and authenticationTerms of service prohibiting systematic querying |
Legal (IP protection), CISO (API security). NIST AI RMF MANAGE 2.2. | DeploymentOperations |
| Membership InferenceAttacker determines whether a specific record was included in the training dataset | High | Membership inference attacks exploit overfitting to determine whether a given data point was in the training set. In medical or financial contexts, confirming training set membership reveals sensitive personal information. | Differential privacy and regularization during trainingLimit output precision (e.g., truncate probability scores)Membership inference red-teaming before deployment |
Data Privacy Officer, Legal. GDPR right-to-erasure implications if individuals can be confirmed in training data. NIST AI RMF MEASURE 2.5. | TrainingEvaluation |
Category C
Output mamp; Behavioral Risks
| Risk | Severity | Description | Security Controls | Governance Touchpoint | Lifecycle Stage |
|---|---|---|---|---|---|
| HallucinationModel generates plausible but factually incorrect or fabricated information with apparent confidence | Critical | LLMs generate text that is statistically plausible but factually wrong, including fabricated citations, false statistics, and incorrect legal or medical guidance. Risk is highest when outputs are used without human verification in consequential decisions. | Retrieval-augmented generation (RAG) for factual use casesMandatory human review for consequential outputsCitation and source attribution promptingOutput confidence signaling where supported |
AI Governance Committee defines prohibited unreviewed uses. Business unit leads own review checkpoints. NIST AI RMF MEASURE 2.1, MANAGE 2.4. | Operations |
| Model DriftDegradation in model performance over time as real-world data distribution shifts | High | Models trained on historical data may underperform as the world changes. Concept drift (the relationship between inputs and outputs changes) and data drift (input distribution changes) can both silently degrade model reliability. | Baseline performance metrics established at deploymentScheduled drift monitoring against defined thresholdsAutomated alert on performance degradationDefined re-evaluation or retirement triggers |
Technical owner monitors performance. AI Governance Committee reviews at defined intervals. NIST AI RMF MEASURE 1.1, MANAGE 3.2. | Operations |
| Harmful Content GenerationModel produces content that is dangerous, illegal, discriminatory, or violates acceptable use standards | Critical | Without adequate safety training and guardrails, models can generate content that facilitates harm, including instructions for dangerous activities, hate speech, harassment material, or content that violates platform policies. | Content safety classifier on all outputsSystem prompt with prohibited output categoriesHuman review queue for flagged outputsIncident logging and vendor escalation path |
AI Governance Committee defines prohibited output categories. CISO owns classifier infrastructure. NIST AI RMF GOVERN 1.2, MAP 5.2. | EvaluationDeploymentOperations |
| Misuse / Dual-UseAI system is used for purposes outside its intended scope, including malicious applications | High | General-purpose AI systems can be applied to harmful purposes their designers did not intend, including disinformation generation, social engineering assistance, surveillance, or acceleration of cyberattack development. | Acceptable use policy with prohibited use categoriesUse case review at intakeOutput monitoring for misuse pattern detectionTerms enforcement and account suspension capability |
Legal defines prohibited use categories. AI Governance Committee reviews high-risk use cases. NIST AI RMF GOVERN 1.2. | DeploymentOperations |
Category D
Adversarial mamp; Input Manipulation Risks
| Risk | Severity | Description | Security Controls | Governance Touchpoint | Lifecycle Stage |
|---|---|---|---|---|---|
| Adversarial ExamplesImperceptibly modified inputs that cause model misclassification or unexpected behavior | Critical | Small, often imperceptible perturbations to input data (images, text, audio) can cause confident misclassification. In safety-critical systems (medical imaging, fraud detection, autonomous systems) this poses direct physical risk. | Adversarial training with augmented examplesInput preprocessing and certified defensesEnsemble methods to increase robustnessAdversarial red-teaming before production deployment |
CISO and technical owner responsible for adversarial testing. Required for EU AI Act Art. 15 compliance in high-risk systems. NIST AI RMF MEASURE 2.5. | EvaluationDeployment |
| Prompt InjectionMalicious instructions in user input override system prompts or hijack model behavior | Critical | User-supplied input containing adversarial instructions can override system-level prompts, causing the model to ignore safety guidelines, reveal confidential information, or act as an unauthorized agent. OWASP LLM Top 10 #1. | Strict system prompt / user input separationInput sanitization and injection pattern detectionPrivilege separation: untrusted input cannot invoke privileged actionsAdversarial prompt test suite in pre-deployment |
CISO owns technical controls. AI Governance Committee reviews agentic deployments. NIST AI RMF MAP 5.1, MANAGE 2.2. | DeploymentOperations |
| Indirect Prompt InjectionMalicious instructions embedded in external data sources retrieved and processed by an AI agent | Critical | When AI agents retrieve and process external content (web pages, documents, emails), adversaries can embed hidden instructions that hijack agent behavior. Particularly dangerous in agentic systems with tool use and API access. | Treat all retrieved content as untrustedSanitize external data before context inclusionRestrict agent tool scope when processing external contentAudit logs of agent actions for anomaly detection |
CISO and technical owner. AI Governance Committee reviews all Tier 2+ agentic systems. NIST AI RMF GOVERN 6.1. | DeploymentOperations |
| Evasion AttacksInputs crafted to avoid detection by AI-based security classifiers | High | When AI is deployed for security purposes (fraud detection, content moderation, malware detection), adversaries craft inputs that exploit model blind spots to evade detection while still achieving malicious objectives. | Adversarial training against known evasion techniquesEnsemble and diverse model classifiersContinuous monitoring for evasion pattern emergenceHuman-in-the-loop for ambiguous classifier outputs |
CISO. Evasion risk assessment required before deploying AI in security-critical roles. NIST AI RMF MEASURE 2.5. | EvaluationOperations |
Usage Note
This taxonomy is designed as a living reference. New risk categories should be added as the threat landscape evolves. Each entry should be reviewed annually and updated following any significant incident, model update, or new research publication affecting the relevant risk category.