18
Mar
2026
Three VMSS Uniform Settings for HPC/AI Users
If you deploy GPU VMs on Azure using VMSS Uniform mode, three settings can make the difference between a successful a...
HPC/AI @Microsoft
If you deploy GPU VMs on Azure using VMSS Uniform mode, three settings can make the difference between a successful a...
We implement a ring allreduce algorithm from scratch in Python, run it on 16 NVIDIA H100 GPUs across 2 nodes with Inf...
Hands-on experiments on an NVIDIA H100 GPU reveal why KV cache — not model weights — dominates GPU memory during infe...
When scaling down an AKS node pool, which node gets deleted? The answer depends on how you trigger the scale-in and w...
TL;DR — We benchmarked NCCL Ring, Tree, and Default allreduce algorithms across 1–8 nodes (8–64 H100 GPUs) on Azure N...
Introduction A single thermally throttled GPU — one out of sixteen — can cut your distributed training throughput by...