Gowtham Ramu
Author, Trainer and Advisor
Long-form engineering notes
Newsletters below are published on dcops.ai and LinkedIn. Click any post to read the full article.
Required for Express Interest
Subscribe once to unlock Express Interest (or register for any event).
AI systems are becoming memory-bound. NVIDIA ICMS and BlueField-4 introduce a new context tier that changes GPU datacenter design.
The core change is not just faster cuDF. Distributed GPU data systems are converging toward GPU-native collective communication.
How to choose the right abstraction for AI infrastructure: container-native GPU stacks vs KubeVirt VM isolation.
Demand, salary ranges, and skill priorities for AI infrastructure roles across HPC-style GPU clusters.
A practical runbook for executing train->retrain inside a 24-hour AWS P6e UltraServer capacity block using EKS, DRA/IMEX, and EFA.
How Rubin/NVL72 shifts Kubernetes realities: topology-aware scheduling, NCCL stability, DPU offload, CPU isolation, and multi-tenancy.
A system-architecture view: cabling, firmware lifecycle, scheduler blind spots, deterministic fabrics, DPU offload, and failure domains.
Bio
Gowtham Ramu
Author, Trainer and Advisor
Preetham Ramu
Co-Author and Trainer