Protected

Newsletter is available after login. Redirecting…

If you are not redirected, login.

Newsletter

Newsletter #05: Containers vs KubeVirt for GPU Workloads

How to choose the right abstraction for AI infrastructure: container-native GPU stacks vs KubeVirt VM isolation.

2026-02-19

How to choose the right abstraction for AI infrastructure

The conversation around AI infrastructure is rapidly shifting from “which model?” to “which runtime and platform?”

For engineers building GPU clusters, a fundamental architectural decision appears early:

Should GPU workloads run in containers, or inside virtual machines using KubeVirt?

Both approaches are valid.

But from a GPU performance, scheduling, and operational perspective, they behave very differently.

This article breaks down the trade-offs with a focus on real AI and HPC workloads, not just generic Kubernetes discussions.


The core difference: process vs hardware isolation

At a high level:

AspectContainersKubeVirt (VMs on Kubernetes)
Isolation modelProcess-levelHardware-level (virtual machine)
Runtimecontainerd / CRI-OQEMU/KVM via KubeVirt
OverheadMinimalHigher due to hypervisor
GPU accessDirect via device pluginPCI passthrough or vGPU

Containers share the host kernel.

KubeVirt creates a full virtual machine, including a guest OS, inside Kubernetes.

That architectural difference directly impacts how the GPU is accessed.


How the GPU is reached

Container path

In a containerized workload, the GPU is exposed through the NVIDIA device plugin.

Application
  -> CUDA
    -> Container runtime
      -> Host kernel
        -> GPU

There is no hypervisor in the path.

This is why containers deliver near bare-metal GPU performance.

KubeVirt VM path

With KubeVirt, the GPU is typically attached using PCI passthrough or mediated devices.

Application
  -> CUDA
    -> Guest OS kernel
      -> Hypervisor (KVM/QEMU)
        -> Host kernel
          -> GPU

Now the GPU sits behind:

  • A guest OS
  • A hypervisor layer
  • Additional scheduling logic

This adds isolation, but also complexity and overhead.


Performance differences for AI workloads

MetricContainersKubeVirt (VM)
GPU compute overhead~0-2%~2-10%
Memory bandwidthNativeSlightly reduced
NVLink performanceFullMay degrade
NCCL scalingOptimalRequires tuning
LatencyLowestSlightly higher

For:

  • Large language model training
  • Multi-GPU scaling
  • NCCL-heavy jobs
  • GPUDirect RDMA

Containers consistently outperform VM-based approaches.

This is why most hyperscalers and AI labs run container-native GPU stacks.


Multi-GPU and distributed training

Distributed AI training depends heavily on:

  • NCCL performance
  • NVLink topology
  • RDMA fabric efficiency
FeatureContainersKubeVirt
NCCL across GPUsNativeMore complex
NVLink awarenessNativeNeeds passthrough tuning
GPUDirect RDMAStraightforwardComplex setup
Multi-node scalingEasierExtra VM networking layers

For large-scale training, containers are the industry standard.


Where KubeVirt makes sense

Despite the performance edge of containers, KubeVirt has a strong value proposition in specific environments.

RequirementBetter choice
Strong tenant isolationKubeVirt
Regulated environmentsKubeVirt
Legacy VM-based ML stacksKubeVirt
GPU-as-a-service platformsKubeVirt

Because each workload runs inside its own VM:

  • Isolation is stronger
  • Compliance is easier
  • Noisy-neighbor issues are reduced

This matters in:

  • Financial institutions
  • Government clouds
  • Multi-tenant enterprise clusters

GPU sharing and partitioning

FeatureContainersKubeVirt
MIG supportNativeSupported via passthrough
Time slicingSupportedMore complex
vGPU (GRID)LimitedStrong use case
Isolation granularityModerateStrong

KubeVirt is often used in:

  • VDI environments
  • Enterprise GPU clouds
  • Secure multi-tenant platforms

Operational complexity

AspectContainersKubeVirt
SetupSimple (GPU Operator)More complex
DebuggingStandard K8s toolsVM + K8s layers
SchedulingNative GPU schedulingExtra VM logic
GPU densityHigherLower

Containers provide:

  • Simpler operations
  • Higher GPU utilization
  • Better scheduling efficiency

Real-world deployment patterns

Container-native GPU clusters

Used for:

  • LLM training
  • Model serving
  • HPC workloads
  • Research clusters
  • AI factories

This is the dominant model in:

  • Hyperscalers
  • AI startups
  • Research institutions

KubeVirt GPU clusters

Used for:

  • Enterprise private clouds
  • Regulated industries
  • Multi-tenant GPU platforms
  • Legacy VM-based workloads

Quick decision guide

Use caseRecommended approach
LLM trainingContainers
High-performance NCCL workloadsContainers
GPU microservicesContainers
Multi-tenant GPU cloudKubeVirt
Regulated environmentKubeVirt
Legacy VM ML appsKubeVirt

The industry reality

For modern AI infrastructure:

Containers are the default choice.

They provide:

  • Maximum GPU performance
  • Better scaling
  • Simpler operations
  • Higher utilization

KubeVirt is not a replacement for containers.

It is a complementary model for environments where isolation and compatibility matter more than raw performance.


If you are building or operating GPU clusters today, the real question is not:

“Containers or VMs?”

It is:

“Which workloads truly need VM-grade isolation, and which should run container-native for maximum performance?”

That decision defines the efficiency, cost, and scalability of your AI platform.