Exam focus
Primary domain: Data for LLM Applications (10%). Secondary: Productionizing LLM Solutions (22%).
- Data pipelines
- ETL workflows
- Feature engineering
- Embedding pipelines
- Dataset labeling
- Versioning datasets
- Data governance
- Model versioning
- Experiment tracking
- MLOps concepts
- CI/CD for ML
- Monitoring deployed models
- Drift detection
- Feedback loops
Scope Bullet Explanations
- Data pipelines: Automated data movement/processing from source to model-ready artifacts.
- ETL workflows: Extract, transform, load steps for structured data preparation.
- Feature engineering: Creating useful input signals from raw data.
- Embedding pipelines: Generating, storing, and refreshing vector representations.
- Dataset labeling: Creating supervised targets and annotation quality controls.
- Versioning datasets: Immutable dataset snapshots for reproducibility and audit.
- Data governance: Policies for data quality, access, lineage, and stewardship.
- Model versioning: Tracking model artifacts across training and deployment cycles.
- Experiment tracking: Logging parameters, metrics, artifacts, and outcomes per run.
- MLOps concepts: Operational practices for reliable ML/LLM delivery.
- CI/CD for ML: Automated validation, packaging, and release processes for model systems.
- Monitoring deployed models: Continuous quality, latency, cost, and safety monitoring.
- Drift detection: Detecting shifts in input data or model behavior over time.
- Feedback loops: Using user/system signals to drive iterative improvements.
Chapter overview
Data workflows determine whether LLM systems stay reliable over time. This chapter covers ETL patterns, dataset/version governance, experiment tracking, monitoring, drift detection, and feedback loops needed for operational maturity.
Learning objectives
- Design data and embedding pipelines for repeatable LLM application quality.
- Apply dataset and model versioning for reproducibility and auditability.
- Integrate MLOps concepts including CI/CD for ML systems.
- Detect drift and trigger corrective actions through feedback loops.
12.1 Data pipeline foundations
ETL workflows
Extract data from source systems, transform into normalized formats, and load into model-ready stores.
Feature and embedding pipelines
For LLM apps, feature engineering often includes text cleaning, metadata enrichment, chunking, and embedding generation.
Labeling workflows
Labeling quality affects fine-tuning, evaluation, and alignment. Include rubric guidance, sampling strategy, and QA checks.
12.2 Versioning and lineage
Dataset versioning
Every training or evaluation run should reference immutable dataset versions.
Model versioning
Track model checkpoints, adaptation artifacts, and deployment tags.
End-to-end lineage
Link: source data version -> preprocessing version -> training run -> model version -> deployment version.
Lineage is essential for debugging, audit, and rollback.
12.3 Experiment tracking and MLOps
Experiment tracking
Log hyperparameters, metrics, artifacts, and code revisions per run.
CI/CD for ML
ML pipelines need additional gates beyond software unit tests:
- data quality checks,
- evaluation thresholds,
- bias/safety checks,
- canary rollout controls.
Monitoring deployed models
Monitor quality, latency, error patterns, and policy violations continuously.
12.4 Drift detection and feedback loops
Drift types
- Data drift: input distribution changes.
- Concept drift: relationship between inputs and desired outputs changes.
Drift monitoring signals
- embedding distribution shift,
- retrieval relevance decline,
- answer quality degradation,
- increased escalation or correction rate.
Feedback loops
Use user feedback and incident data to trigger:
- prompt updates,
- retrieval/index updates,
- retraining or re-alignment cycles.
12.5 Operational governance for data workflows
- enforce access controls,
- protect sensitive data,
- define retention and deletion rules,
- maintain change approval process for pipeline modifications.
12.6 Failure modes
- Training on moving datasets with no snapshot control.
- No experiment tracking, leading to irreproducible improvements.
- Monitoring only infrastructure metrics while quality drifts.
- No explicit trigger criteria for retraining decisions.
Chapter summary
Reliable LLM systems require disciplined data and workflow engineering. Versioning, monitoring, and feedback loops are not optional overhead; they are the core mechanism for maintaining quality in production.
Mini-lab: end-to-end MLOps map
Goal: define a complete lifecycle workflow for one LLM feature.
- List data sources and ETL steps.
- Define dataset, model, and deployment version IDs.
- Specify experiment logging fields.
- Define CI/CD gates and release criteria.
- Add drift signals and retraining triggers.
- Assign owners for each stage. Deliverable in Notion:
- Lifecycle map with lineage fields, monitoring rules, and trigger thresholds.
Review questions
- Why is dataset versioning mandatory for reproducibility?
- What extra controls does ML CI/CD need compared to classic software CI/CD?
- How does experiment tracking reduce incident resolution time?
- What distinguishes data drift from concept drift?
- Which monitoring signals best predict future quality regressions?
- Why must model lineage include preprocessing versions?
- When should drift trigger prompt updates versus retraining?
- How do access controls fit into data governance for LLMs?
- What failure occurs when quality metrics are excluded from monitoring?
- Why are feedback loops central to long-term reliability?
Key terms
ETL, embedding pipeline, dataset versioning, model lineage, experiment tracking, MLOps, CI/CD for ML, monitoring, data drift, concept drift, feedback loop.
Exam traps
- Assuming one successful launch means pipeline maturity.
- Ignoring lineage between preprocessing and model behavior.
- Treating drift detection as optional in low-volume systems.