20
Apr
2026
Decoding Azure’s NCCL Topology Files
Introduction Spin up an Standard_ND96isr_H100_v5 VM on Azure, poke around /opt/microsoft/, and you’ll find a curious...
HPC/AI @Microsoft
Introduction Spin up an Standard_ND96isr_H100_v5 VM on Azure, poke around /opt/microsoft/, and you’ll find a curious...
Introduction
TL;DR — I captured per-port, 10-millisecond resolution InfiniBand traffic during FSDP fine-tuning of Qwen-7B (dense) ...
If you deploy GPU VMs on Azure using VMSS Uniform mode, three settings can make the difference between a successful a...
We implement a ring allreduce algorithm from scratch in Python, run it on 16 NVIDIA H100 GPUs across 2 nodes with Inf...