15
Mar
2026
Dense vs MoE: IB and GPU Communication Patterns
Introduction In my previous post, I showed that Mixtral 8x7B (MoE) requires 56× more effective interconnect bandwidt...
HPC/AI @Microsoft
Introduction In my previous post, I showed that Mixtral 8x7B (MoE) requires 56× more effective interconnect bandwidt...
Introduction In my previous post, I showed that InfiniBand delivers 27–57× higher multi-node throughput than Etherne...
The Problem
The Problem: Every Node Needs the Same Data When you scale a GPU cluster beyond a single node, you immediately hit a...
Introduction In a previous post, I ran Qwen2.5-72B inference on Azure H100 nodes and showed how NVLink’s 900 GB/s ba...