Protected

NCA-GENL mock questions are available after login. Redirecting...

If you are not redirected, login.

Courses / Nvidia / NCA-GENL / Self-Test

NCA-GENL Mock Questions

Multiple-choice practice to test concept understanding

80 total questions · 4 sets · 20 per set · no duplicates

How to Use

Attempt each set without opening answers first.
Use explanations to identify weak domains and chapter gaps.
Repeat missed questions after reviewing related chapter pages.

Set 1

Set 1 · Question 1 · Chapter 1

What target does self-supervised LLM pretraining usually optimize?

Next-token prediction on unlabeled text
Human preference rankings only
Image segmentation masks
SQL query execution accuracy

Show answer

Correct: Next-token prediction on unlabeled text

LLM pretraining typically uses next-token prediction built from raw text without manual labels.

Set 1 · Question 2 · Chapter 1

In a support assistant, which component is usually discriminative?

Intent classifier that routes tickets
Model generating final answer text
Image generator producing screenshots
Synthetic data creator

Show answer

Correct: Intent classifier that routes tickets

Classification/routing is a discriminative task; answer generation is generative.

Set 1 · Question 3 · Chapter 1

What is a practical benefit of transfer learning for enterprise teams?

Lower adaptation time and compute cost
Guaranteed zero hallucinations
No need for evaluation
No tokenizer required

Show answer

Correct: Lower adaptation time and compute cost

Starting from pretrained weights reduces data, time, and compute needed for downstream tasks.

Set 1 · Question 4 · Chapter 1

Which loss is standard for token prediction tasks?

Mean squared error
Cross-entropy
Hinge loss
Huber loss

Show answer

Correct: Cross-entropy

Cross-entropy aligns naturally with probabilistic next-token prediction.

Set 1 · Question 5 · Chapter 1

Which technique most directly helps stabilize exploding gradients?

Gradient clipping
Vocabulary expansion
Prompt caching
Label smoothing only

Show answer

Correct: Gradient clipping

Gradient clipping limits update magnitude and helps prevent unstable loss spikes.

Set 1 · Question 6 · Chapter 2

Which transformer family is dominant for autoregressive chat generation?

Encoder-only
Decoder-only
Encoder-decoder only
CNN-RNN hybrid

Show answer

Correct: Decoder-only

Decoder-only models predict the next token from prior context and are common for chat.

Set 1 · Question 7 · Chapter 2

Why are positional encodings needed in transformers?

To add token order information
To compress model weights
To remove attention heads
To replace tokenization

Show answer

Correct: To add token order information

Attention alone is permutation-invariant, so position signals are needed for sequence order.

Set 1 · Question 8 · Chapter 2

What is the core value of multi-head attention?

It allows multiple relational views in parallel
It removes the need for FFN blocks
It guarantees factual correctness
It makes context windows infinite

Show answer

Correct: It allows multiple relational views in parallel

Different heads can learn complementary relationships such as syntax and long-range dependencies.

Set 1 · Question 9 · Chapter 2

Which statement about context windows is correct?

Larger windows always improve quality
Only user text counts toward token limits
Long context can still degrade quality if retrieval is noisy
Context limits affect training only, not inference

Show answer

Correct: Long context can still degrade quality if retrieval is noisy

Large windows still require relevant, clean context; irrelevant chunks can hurt answer quality.

Set 1 · Question 10 · Chapter 2

What does KV cache primarily improve during decoding?

Data privacy
Generation latency
Tokenizer quality
Model alignment

Show answer

Correct: Generation latency

Reusing prior key/value states avoids redundant computation and reduces token generation latency.

Set 1 · Question 11 · Chapter 3

Which sequence best describes lifecycle order for many LLM projects?

Fine-tuning -> pretraining -> tokenization
Pretraining -> adaptation (SFT/PEFT) -> deployment
Deployment -> pretraining -> labeling
Prompting -> backpropagation -> retrieval

Show answer

Correct: Pretraining -> adaptation (SFT/PEFT) -> deployment

General capability is built in pretraining, then adapted and deployed for specific tasks.

Set 1 · Question 12 · Chapter 3

Supervised fine-tuning (SFT) depends on which data format?

Labeled input-output examples
Only unlabeled web crawl text
Only telemetry logs
Only embeddings without text

Show answer

Correct: Labeled input-output examples

SFT uses curated labeled examples to shape task behavior and response style.

Set 1 · Question 13 · Chapter 3

What is the main purpose of instruction tuning?

Increase PCIe bandwidth
Improve instruction following and response formatting
Replace tokenizer training
Disable model regularization

Show answer

Correct: Improve instruction following and response formatting

Instruction tuning improves how reliably the model follows user intent and format constraints.

Set 1 · Question 14 · Chapter 3

What does data parallelism do?

Splits one model layer across GPUs
Replicates model copies and splits batches across workers
Caches retrieval chunks in vector DB
Converts FP16 to INT8 at runtime

Show answer

Correct: Replicates model copies and splits batches across workers

Data parallelism keeps model replicas on workers and distributes input batches.

Set 1 · Question 15 · Chapter 3

When is model parallelism most needed?

When a model does not fit on a single device
When prompts are short
When using only CPU inference
When no checkpointing is required

Show answer

Correct: When a model does not fit on a single device

Model parallelism partitions large models across multiple devices when single-device memory is insufficient.

Set 1 · Question 16 · Chapter 3

What is a common benefit of mixed precision training?

Lower memory use and faster training
Guaranteed best perplexity
Automatic bias removal
No need for optimizer tuning

Show answer

Correct: Lower memory use and faster training

FP16/BF16 usually reduce memory pressure and improve throughput.

Set 1 · Question 17 · Chapter 3

What problem does gradient accumulation solve?

Merges vector databases
Emulates larger effective batch size with limited memory
Adds long-context support
Creates synthetic labels

Show answer

Correct: Emulates larger effective batch size with limited memory

It accumulates gradients over micro-batches before update steps.

Set 1 · Question 18 · Chapter 3

Why is checkpointing essential during long training runs?

It guarantees fairness compliance
It enables recovery and reproducibility
It replaces evaluation metrics
It reduces tokenizer size

Show answer

Correct: It enables recovery and reproducibility

Checkpoints allow restart after failure and provide auditable run artifacts.

Set 1 · Question 19 · Chapter 3

What distinguishes AdamW from classic Adam in practice?

Decoupled weight decay from gradient update
No learning rate required
Only works on CPUs
Cannot train transformers

Show answer

Correct: Decoupled weight decay from gradient update

AdamW applies weight decay in a decoupled way and is widely used in modern training pipelines.

Set 1 · Question 20 · Chapter 3

Why is learning-rate warmup commonly used at training start?

To stabilize early optimization before full step sizes
To freeze all model layers permanently
To increase tokenizer vocabulary quickly
To bypass gradient computation

Show answer

Correct: To stabilize early optimization before full step sizes

Warmup prevents unstable early updates by ramping learning rate gradually.

Set 2

Set 2 · Question 1 · Chapter 4

When does few-shot prompting usually help most?

When output format consistency is important
When you want to disable system prompts
When training from scratch
When reducing GPU clock speed

Show answer

Correct: When output format consistency is important

Few-shot examples anchor format and behavior for tasks with strict output expectations.

Set 2 · Question 2 · Chapter 4

What is the role of a system prompt in an LLM application?

High-level behavior and policy instruction layer
Vector index storage layer
GPU scheduling layer
Checkpoint serialization format

Show answer

Correct: High-level behavior and policy instruction layer

System prompts define global instructions and boundaries for model behavior.

Set 2 · Question 3 · Chapter 4

What effect does increasing temperature generally have?

More deterministic outputs
More sampling randomness
Lower token count limits
Higher embedding dimensions

Show answer

Correct: More sampling randomness

Higher temperature increases variability in token sampling.

Set 2 · Question 4 · Chapter 4

What does top-k sampling do?

Selects from top k probable tokens each step
Selects from tokens until cumulative probability p
Always selects the maximum probability token
Selects random tokens from full vocabulary

Show answer

Correct: Selects from top k probable tokens each step

Top-k restricts sampling to a fixed-size candidate set of likely tokens.

Set 2 · Question 5 · Chapter 4

What is a common beam search tradeoff?

Higher diversity with lower compute
Potentially better sequence likelihood with higher compute cost
No dependence on logits
Eliminates need for prompt engineering

Show answer

Correct: Potentially better sequence likelihood with higher compute cost

Beam search explores multiple candidate sequences but costs more compute.

Set 2 · Question 6 · Chapter 4

Which setup is most deterministic for repeated runs?

Greedy decoding with fixed prompts
High temperature with top-p
Random seed omitted with sampling
Few-shot with stochastic decoding

Show answer

Correct: Greedy decoding with fixed prompts

Greedy decoding avoids random sampling and is typically repeatable with fixed conditions.

Set 2 · Question 7 · Chapter 4

Which control best reduces prompt injection risk from retrieved documents?

Treat retrieved content as untrusted and enforce policy boundaries
Increase max tokens
Remove system prompts entirely
Disable metadata filters

Show answer

Correct: Treat retrieved content as untrusted and enforce policy boundaries

Injection defense depends on trust boundaries, sanitization, and strict policy enforcement.

Set 2 · Question 8 · Chapter 4

Why use structured output schemas in production?

To improve downstream parsing reliability
To train larger foundation models
To avoid evaluation completely
To disable function calling

Show answer

Correct: To improve downstream parsing reliability

Schema-constrained outputs reduce parser failures and integration errors.

Set 2 · Question 9 · Chapter 4

What is the main purpose of function calling in LLM workflows?

Map model output to explicit tool/action interfaces
Increase parameter count
Compress checkpoints
Replace retrieval systems

Show answer

Correct: Map model output to explicit tool/action interfaces

Function calling turns model intent into structured, callable actions.

Set 2 · Question 10 · Chapter 5

What core problem does RAG solve?

Grounding responses with external knowledge
Replacing tokenization
Eliminating need for inference optimization
Automatically tuning reward models

Show answer

Correct: Grounding responses with external knowledge

RAG connects generation to retrievable evidence for fresher and auditable answers.

Set 2 · Question 11 · Chapter 5

What is an embedding in RAG systems?

A dense vector representing semantic meaning
A compressed checkpoint archive
A decoding temperature schedule
A GPU utilization metric

Show answer

Correct: A dense vector representing semantic meaning

Embeddings map text (or other data) into vectors used for similarity search.

Set 2 · Question 12 · Chapter 5

Why use overlapping chunks when indexing documents?

To preserve context across chunk boundaries
To reduce vocabulary size
To force deterministic decoding
To disable reranking

Show answer

Correct: To preserve context across chunk boundaries

Overlap helps avoid losing important context that spans adjacent segments.

Set 2 · Question 13 · Chapter 5

What is the benefit of hybrid search (keyword + semantic)?

Better recall and precision balance
Lower need for embeddings
No need for chunking strategy
Guaranteed zero latency

Show answer

Correct: Better recall and precision balance

Combining lexical and semantic retrieval can improve both coverage and relevance.

Set 2 · Question 14 · Chapter 5

What is reranking used for in a retrieval pipeline?

Improve relevance ordering among initial retrieved candidates
Convert vectors to images
Increase context window limit
Train tokenizer merge rules

Show answer

Correct: Improve relevance ordering among initial retrieved candidates

Reranking refines candidate ordering so better evidence reaches generation.

Set 2 · Question 15 · Chapter 5

Why apply metadata filtering during retrieval?

Limit retrieval to valid source scope or tenant boundaries
Increase model width
Reduce prompt template size
Bypass access control

Show answer

Correct: Limit retrieval to valid source scope or tenant boundaries

Metadata constraints enforce relevance and governance requirements.

Set 2 · Question 16 · Chapter 6

What is PEFT primarily optimizing for?

Lower adaptation cost with smaller trainable parameter sets
Higher tokenization speed only
Replacing evaluation metrics
Removing all adaptation artifacts

Show answer

Correct: Lower adaptation cost with smaller trainable parameter sets

PEFT updates a small subset of parameters to reduce training and storage cost.

Set 2 · Question 17 · Chapter 6

What is the key idea behind LoRA updates?

Low-rank trainable matrices approximate weight deltas
Replace attention with convolutions
Remove residual connections
Use only greedy decoding

Show answer

Correct: Low-rank trainable matrices approximate weight deltas

LoRA learns compact low-rank updates rather than full-matrix modifications.

Set 2 · Question 18 · Chapter 6

What does prompt tuning modify?

Virtual prompt embeddings rather than core model weights
All transformer layers
Vector database indexes
GPU firmware

Show answer

Correct: Virtual prompt embeddings rather than core model weights

Prompt tuning learns input-side embeddings and keeps most model weights frozen.

Set 2 · Question 19 · Chapter 6

What is a practical risk of model merging?

Behavior regressions without strong validation
Guaranteed incompatibility with adapters
Inability to run on GPUs
Automatic policy compliance

Show answer

Correct: Behavior regressions without strong validation

Merged checkpoints can introduce unexpected quality or safety regressions.

Set 2 · Question 20 · Chapter 6

Which factor most influences adaptation method choice?

Quality target, compute budget, and deployment constraints
Color theme of UI dashboard
Number of newsletter subscribers
CPU fan profile

Show answer

Correct: Quality target, compute budget, and deployment constraints

Method selection is a tradeoff between required quality and operational cost/complexity.

Set 3

Set 3 · Question 1 · Chapter 7

What is the reward model output used for in RLHF?

Preference-aligned quality scores for candidate responses
Direct token generation
Embedding similarity only
GPU allocation schedules

Show answer

Correct: Preference-aligned quality scores for candidate responses

The reward model estimates preference quality and guides policy optimization.

Set 3 · Question 2 · Chapter 7

Which step comes before policy optimization in a basic RLHF pipeline?

Train reward model from preference data
Quantize deployment model
Build vector index
Run load test

Show answer

Correct: Train reward model from preference data

Preference data is used to train reward scoring before updating policy behavior.

Set 3 · Question 3 · Chapter 7

Which failure mode is associated with over-optimizing reward models?

Reward hacking
Tokenizer dropout
Context window overflow
Kernel panic

Show answer

Correct: Reward hacking

Models may exploit reward-model shortcuts rather than improving true usefulness.

Set 3 · Question 4 · Chapter 7

What does Constitutional AI emphasize?

Principle-based self-critique and revision
Only supervised classification
Only rule-based chatbot behavior
Removing human evaluation

Show answer

Correct: Principle-based self-critique and revision

Constitutional AI uses explicit principles to critique and refine responses.

Set 3 · Question 5 · Chapter 7

What is the objective of safety alignment?

Reduce unsafe or policy-violating outputs
Maximize context window size
Increase model parameter count
Eliminate all refusals

Show answer

Correct: Reduce unsafe or policy-violating outputs

Safety alignment steers behavior toward safer, policy-consistent outputs.

Set 3 · Question 6 · Chapter 8

Perplexity is primarily used to measure what?

How well a model predicts token sequences
GPU memory temperature
Prompt injection risk
RAG retrieval freshness

Show answer

Correct: How well a model predicts token sequences

Perplexity reflects predictive uncertainty over token sequences.

Set 3 · Question 7 · Chapter 8

Which metric family is commonly used in summarization evaluation?

ROUGE
IoU
AUC-ROC
PSNR

Show answer

Correct: ROUGE

ROUGE compares overlap patterns and is a common summarization baseline metric.

Set 3 · Question 8 · Chapter 8

What does tokens/sec indicate in inference monitoring?

Generation throughput speed
Ground-truth factuality
Dataset labeling quality
Prompt template complexity

Show answer

Correct: Generation throughput speed

Tokens/sec measures how fast the system generates output tokens.

Set 3 · Question 9 · Chapter 8

Which pair best captures a common serving tradeoff?

Higher batching can increase throughput but add latency
Higher throughput always lowers latency
Lower latency always lowers cost
Higher accuracy always lowers GPU use

Show answer

Correct: Higher batching can increase throughput but add latency

Batching improves utilization/throughput but queueing can increase response delay.

Set 3 · Question 10 · Chapter 8

What is the purpose of hallucination detection checks?

Find plausible outputs unsupported by evidence
Increase context window length
Compress model checkpoints
Expand tokenizer vocabulary

Show answer

Correct: Find plausible outputs unsupported by evidence

Hallucination checks target unsupported claims that sound confident but are not grounded.

Set 3 · Question 11 · Chapter 8

Robustness testing is designed to evaluate what?

Stability under noisy or shifted inputs
Only average BLEU improvement
Only training duration
Only GPU purchase cost

Show answer

Correct: Stability under noisy or shifted inputs

Robustness evaluates how performance holds under input variation and distribution shift.

Set 3 · Question 12 · Chapter 8

What makes adversarial testing different from routine regression tests?

It intentionally uses difficult or malicious inputs
It ignores safety outcomes
It runs only once per year
It does not require expected behaviors

Show answer

Correct: It intentionally uses difficult or malicious inputs

Adversarial tests intentionally stress weak points in behavior and controls.

Set 3 · Question 13 · Chapter 9

Which practice helps reduce bias risk early in the lifecycle?

Dataset auditing and balanced sampling
Disabling evaluation pipelines
Removing all metadata
Increasing temperature

Show answer

Correct: Dataset auditing and balanced sampling

Data quality and representation checks are foundational bias mitigation controls.

Set 3 · Question 14 · Chapter 9

Where is toxicity detection most useful in an LLM app pipeline?

Both input and output filtering stages
Only before tokenization
Only during model pretraining
Only on infrastructure logs

Show answer

Correct: Both input and output filtering stages

Applying toxicity checks at ingress and egress reduces harmful content risk.

Set 3 · Question 15 · Chapter 9

What are guardrails in this course context?

Policy and rule layers constraining model and tool behavior
GPU chassis rails
Learning-rate decay formulas
Vector embedding norms

Show answer

Correct: Policy and rule layers constraining model and tool behavior

Guardrails enforce operational and policy boundaries around model actions.

Set 3 · Question 16 · Chapter 9

What best describes a jailbreak attempt?

An adversarial prompt trying to bypass safety policy
A failed checkpoint restore
A tokenizer training bug
A cloud autoscaling delay

Show answer

Correct: An adversarial prompt trying to bypass safety policy

Jailbreak prompts attempt to circumvent refusal and safety constraints.

Set 3 · Question 17 · Chapter 9

Which control is central to protecting sensitive user data?

Access controls plus data minimization and redaction
Higher top-k sampling
Longer prompts
Lower beam width

Show answer

Correct: Access controls plus data minimization and redaction

Privacy protection depends on strict data handling controls, not decoding settings.

Set 3 · Question 18 · Chapter 9

What does model governance primarily provide?

Ownership, approvals, audit trails, and change control
Automatic context expansion
Guaranteed low latency
Prompt style personalization

Show answer

Correct: Ownership, approvals, audit trails, and change control

Governance establishes accountability and traceability across model lifecycle decisions.

Set 3 · Question 19 · Chapter 9

Why are compliance requirements not one-size-fits-all?

Requirements vary by region and industry
LLMs remove legal obligations
Only cloud providers define compliance
Compliance applies only to training

Show answer

Correct: Requirements vary by region and industry

Regulatory obligations differ by jurisdiction, sector, and data sensitivity.

Set 3 · Question 20 · Chapter 7

What is the operational value of feedback loops after deployment?

Continuous correction and iterative behavior improvement
Permanent model freeze
Elimination of human review
Removal of monitoring

Show answer

Correct: Continuous correction and iterative behavior improvement

Feedback loops close the gap between observed production behavior and target behavior.

Set 4

Set 4 · Question 1 · Chapter 10

What defines a multimodal model?

A model that can process or generate across multiple data modalities
A model that uses only text
A model that runs only on CPUs
A model trained without embeddings

Show answer

Correct: A model that can process or generate across multiple data modalities

Multimodal systems handle combinations of text, image, audio, or video.

Set 4 · Question 2 · Chapter 10

Which use case is most directly aligned with vision-language models?

Answering questions about an image
Replacing distributed training
Compiling CUDA kernels
Computing BLEU for translation

Show answer

Correct: Answering questions about an image

VLMs jointly reason across visual and textual inputs.

Set 4 · Question 3 · Chapter 10

How do diffusion models generate samples at a high level?

Iterative denoising from noise to structured output
Direct nearest-neighbor retrieval
Rule-based template fill
Single-step deterministic mapping only

Show answer

Correct: Iterative denoising from noise to structured output

Diffusion models repeatedly denoise latent noise into coherent outputs.

Set 4 · Question 4 · Chapter 10

What is the core training setup in GANs?

Generator versus discriminator competition
Teacher-student distillation
Prompt-only adaptation
Reinforcement learning with human feedback

Show answer

Correct: Generator versus discriminator competition

GANs use adversarial training between a generator and discriminator.

Set 4 · Question 5 · Chapter 10

Why are cross-modal embeddings useful?

They enable retrieval across text and media in a shared space
They remove the need for metadata
They eliminate token limits
They guarantee legal compliance

Show answer

Correct: They enable retrieval across text and media in a shared space

Shared embedding spaces connect semantics across different modalities.

Set 4 · Question 6 · Chapter 10

What task does image captioning perform?

Generate descriptive text from visual input
Convert text prompts into videos
Rank retrieval documents
Train optimizer schedules

Show answer

Correct: Generate descriptive text from visual input

Image captioning maps visual content into textual descriptions.

Set 4 · Question 7 · Chapter 11

What is a primary effect of inference quantization?

Lower precision can improve speed and reduce memory use
Automatically improves every accuracy metric
Removes need for serving infrastructure
Disables KV caching

Show answer

Correct: Lower precision can improve speed and reduce memory use

Quantization reduces numeric precision to improve efficiency, with quality tradeoffs to validate.

Set 4 · Question 8 · Chapter 11

What is pruning intended to do?

Remove less important parameters to reduce serving cost
Increase prompt length
Add new attention heads
Change tokenization strategy

Show answer

Correct: Remove less important parameters to reduce serving cost

Pruning targets redundant parameters to improve efficiency.

Set 4 · Question 9 · Chapter 11

How does knowledge distillation usually work?

A smaller student model learns behavior from a larger teacher
A tokenizer learns from beam search
A retriever learns from CUDA kernels
A scheduler learns from invoices

Show answer

Correct: A smaller student model learns behavior from a larger teacher

Distillation transfers useful behavior into a smaller, cheaper model.

Set 4 · Question 10 · Chapter 11

What is TensorRT used for in this stack?

Optimizing inference execution on NVIDIA hardware
Creating annotation guidelines
Building legal compliance checklists
Generating synthetic datasets

Show answer

Correct: Optimizing inference execution on NVIDIA hardware

TensorRT compiles and optimizes inference graphs/kernels for GPU performance.

Set 4 · Question 11 · Chapter 11

Which approach typically improves perceived latency for chat users?

Streaming inference with incremental token output
Very large batch inference only
Offline-only generation
Disabling KV cache

Show answer

Correct: Streaming inference with incremental token output

Streaming returns partial output quickly, improving user-perceived responsiveness.

Set 4 · Question 12 · Chapter 11

What does autoscaling address in production inference systems?

Dynamic capacity adjustment based on load
Automatic prompt rewriting
Automatic reward model retraining
Automatic tokenizer merges

Show answer

Correct: Dynamic capacity adjustment based on load

Autoscaling increases or decreases serving resources as demand changes.

Set 4 · Question 13 · Chapter 11

What is NVIDIA Triton Inference Server primarily for?

Serving models in production with backend/runtime integration
Collecting human preference labels
Training foundation models from scratch
Building ETL transformations

Show answer

Correct: Serving models in production with backend/runtime integration

Triton is a production serving platform supporting multiple model backends.

Set 4 · Question 14 · Chapter 11

How is NVIDIA NIM best described at a high level?

Packaged inference microservices for simpler deployment
A BLEU-like evaluation metric
A tokenization standard
A database migration tool

Show answer

Correct: Packaged inference microservices for simpler deployment

NIM packages model serving components to speed practical deployment.

Set 4 · Question 15 · Chapter 11

What does GPU memory management aim to prevent?

Out-of-memory failures and fragmentation-related instability
Any need for observability
Any need for model versioning
Any need for access control

Show answer

Correct: Out-of-memory failures and fragmentation-related instability

Managing allocation and fragmentation is key for stable high-throughput serving.

Set 4 · Question 16 · Chapter 12

In ETL, which step transforms raw source data into usable format?

Transform
Extract
Load
Archive

Show answer

Correct: Transform

Extract pulls data, transform reshapes/cleans it, and load stores processed outputs.

Set 4 · Question 17 · Chapter 12

Why is dataset versioning important for ML/LLM systems?

It enables reproducibility and auditability
It removes need for evaluation
It guarantees fairness
It increases context window size

Show answer

Correct: It enables reproducibility and auditability

Versioned datasets let teams reproduce results and trace changes over time.

Set 4 · Question 18 · Chapter 12

What should experiment tracking capture for each run?

Parameters, metrics, artifacts, and run context
Only final accuracy
Only model file size
Only deployment region

Show answer

Correct: Parameters, metrics, artifacts, and run context

Complete run context is required for repeatability and debugging.

Set 4 · Question 19 · Chapter 12

What does drift detection monitor after deployment?

Shifts in data distributions or model behavior
Only fan speed and power draw
Only source code comments
Only monthly subscription count

Show answer

Correct: Shifts in data distributions or model behavior

Drift checks detect changing conditions that can silently degrade performance.

Set 4 · Question 20 · Chapter 12

What is the purpose of CI/CD for ML in this course context?

Automate validation, packaging, and release of model changes
Replace human governance entirely
Disable monitoring to reduce cost
Use only manual deployment

Show answer

Correct: Automate validation, packaging, and release of model changes

CI/CD for ML standardizes safe, repeatable model release workflows.

Back to NCA-GENL course page