Protected

Newsletter is available after login. Redirecting…

If you are not redirected, login.

Newsletter

Newsletter #05: Containers vs KubeVirt for GPU Workloads

How to choose the right abstraction for AI infrastructure: container-native GPU stacks vs KubeVirt VM isolation.

2026-02-19

How to choose the right abstraction for AI infrastructure

The conversation around AI infrastructure is rapidly shifting from “which model?” to “which runtime and platform?”

For engineers building GPU clusters, a fundamental architectural decision appears early:

Should GPU workloads run in containers, or inside virtual machines using KubeVirt?

Both approaches are valid.

But from a GPU performance, scheduling, and operational perspective, they behave very differently.

This article breaks down the trade-offs with a focus on real AI and HPC workloads, not just generic Kubernetes discussions.

The core difference: process vs hardware isolation

At a high level:

Aspect	Containers	KubeVirt (VMs on Kubernetes)
Isolation model	Process-level	Hardware-level (virtual machine)
Runtime	containerd / CRI-O	QEMU/KVM via KubeVirt
Overhead	Minimal	Higher due to hypervisor
GPU access	Direct via device plugin	PCI passthrough or vGPU

Containers share the host kernel.

KubeVirt creates a full virtual machine, including a guest OS, inside Kubernetes.

That architectural difference directly impacts how the GPU is accessed.

How the GPU is reached

Container path

In a containerized workload, the GPU is exposed through the NVIDIA device plugin.

Application
  -> CUDA
    -> Container runtime
      -> Host kernel
        -> GPU

There is no hypervisor in the path.

This is why containers deliver near bare-metal GPU performance.

KubeVirt VM path

With KubeVirt, the GPU is typically attached using PCI passthrough or mediated devices.

Application
  -> CUDA
    -> Guest OS kernel
      -> Hypervisor (KVM/QEMU)
        -> Host kernel
          -> GPU

Now the GPU sits behind:

A guest OS
A hypervisor layer
Additional scheduling logic

This adds isolation, but also complexity and overhead.

Performance differences for AI workloads

Metric	Containers	KubeVirt (VM)
GPU compute overhead	~0-2%	~2-10%
Memory bandwidth	Native	Slightly reduced
NVLink performance	Full	May degrade
NCCL scaling	Optimal	Requires tuning
Latency	Lowest	Slightly higher

For:

Large language model training
Multi-GPU scaling
NCCL-heavy jobs
GPUDirect RDMA

Containers consistently outperform VM-based approaches.

This is why most hyperscalers and AI labs run container-native GPU stacks.

Multi-GPU and distributed training

Distributed AI training depends heavily on:

NCCL performance
NVLink topology
RDMA fabric efficiency

Feature	Containers	KubeVirt
NCCL across GPUs	Native	More complex
NVLink awareness	Native	Needs passthrough tuning
GPUDirect RDMA	Straightforward	Complex setup
Multi-node scaling	Easier	Extra VM networking layers

For large-scale training, containers are the industry standard.

Where KubeVirt makes sense

Despite the performance edge of containers, KubeVirt has a strong value proposition in specific environments.

Requirement	Better choice
Strong tenant isolation	KubeVirt
Regulated environments	KubeVirt
Legacy VM-based ML stacks	KubeVirt
GPU-as-a-service platforms	KubeVirt

Because each workload runs inside its own VM:

Isolation is stronger
Compliance is easier
Noisy-neighbor issues are reduced

This matters in:

Financial institutions
Government clouds
Multi-tenant enterprise clusters

Feature	Containers	KubeVirt
MIG support	Native	Supported via passthrough
Time slicing	Supported	More complex
vGPU (GRID)	Limited	Strong use case
Isolation granularity	Moderate	Strong

KubeVirt is often used in:

VDI environments
Enterprise GPU clouds
Secure multi-tenant platforms

Operational complexity

Aspect	Containers	KubeVirt
Setup	Simple (GPU Operator)	More complex
Debugging	Standard K8s tools	VM + K8s layers
Scheduling	Native GPU scheduling	Extra VM logic
GPU density	Higher	Lower

Containers provide:

Simpler operations
Higher GPU utilization
Better scheduling efficiency

Real-world deployment patterns

Container-native GPU clusters

Used for:

LLM training
Model serving
HPC workloads
Research clusters
AI factories

This is the dominant model in:

Hyperscalers
AI startups
Research institutions

KubeVirt GPU clusters

Used for:

Enterprise private clouds
Regulated industries
Multi-tenant GPU platforms
Legacy VM-based workloads

Quick decision guide

Use case	Recommended approach
LLM training	Containers
High-performance NCCL workloads	Containers
GPU microservices	Containers
Multi-tenant GPU cloud	KubeVirt
Regulated environment	KubeVirt
Legacy VM ML apps	KubeVirt

The industry reality

For modern AI infrastructure:

Containers are the default choice.

They provide:

Maximum GPU performance
Better scaling
Simpler operations
Higher utilization

KubeVirt is not a replacement for containers.

It is a complementary model for environments where isolation and compatibility matter more than raw performance.

If you are building or operating GPU clusters today, the real question is not:

“Containers or VMs?”

It is:

“Which workloads truly need VM-grade isolation, and which should run container-native for maximum performance?”

That decision defines the efficiency, cost, and scalability of your AI platform.