13
Mar
2026
Monitoring IB Counters with Prometheus and Grafana
Introduction In my previous post, I showed that InfiniBand delivers 27–57× higher multi-node throughput than Etherne...
HPC/AI @Microsoft
Introduction In my previous post, I showed that InfiniBand delivers 27–57× higher multi-node throughput than Etherne...
The Problem
The Problem: Every Node Needs the Same Data When you scale a GPU cluster beyond a single node, you immediately hit a...
Introduction In a previous post, I ran Qwen2.5-72B inference on Azure H100 nodes and showed how NVLink’s 900 GB/s ba...
Introduction When serving a large language model across multiple GPUs, the choice of parallelism strategy directly d...