H100 white paper

H100 is NVIDIA’s 9th-generation data center GPU. For today’s mainstream AI and HPC models, H100 with InfiniBand interconnect delivers up to 30 times the performance of A100.

Specs	H100	A100 (80GB)	V100
Transistor Count	80B	54.2B	21.1B
TDP	700W	400W	300/350W
Manufacturing Process	TSMC 4N	TSMC 7N	TSMC 12nm FFN
Form Factor	SXM5	SXM4	SXM2/SXM3
Architecture	Hopper	Ampere	Volta
FP32 CUDA Cores	16896	6912	5120
Tensor Cores	528	432	640
Boost Clock (GHz)	1.78	1.41	1.53
Memory Clock (Gbps)	4.8 HBM3	3.2 HBM2e	1.75 HBM2
Memory Bus Width	5120-bit	5120-bit	4096-bit
Memory Bandwidth (TB/s)	3	2	0.9
GPU Memory Capacity (GB)	80	80	16/32
FP32 Vector	60 TFLOPS	19.5 TFLOPS	15.7 TFLOPS
FP64 Vector	30 TFLOPS	9.7 TFLOPS	7.8 TFLOPS
INT8 Tensor	2000 TOPS	624 TOPS	NA
FP16 Tensor	1000 TFLOPS	312 TFLOPS	125 TFLOPS
TF32 Tensor	500 TFLOPS	156 TFLOPS	NA
FP64 Tensor	60 TFLOPS	19.5 TFLOPS	NA
Interconnect	NVLink4 18 Links (900 GB/s)	NVLink3 12 Links (600 GB/s)	NVLink2 6 Links (300 GB/s)

Fourth-generation Tensor Cores
- 6x faster chip-to-chip compared to A100, including per-SM speedup, additional SM count, and higher clocks
- 3x faster IEEE FP64 and FP32 processing rates chip-to-chip compared to A100
- New Thread Block Cluster feature, adding another level to the programming hierarchy to now include Threads, Thread Blocks, Thread Block Clusters, and Grids.
Transformer Engine
- a combination of software and custom Hopper Tensor Core technology
- Transformer Engine intelligently manages and dynamically chooses between FP8 and 16-bit calculations
- 9x faster AI training and up to 30x faster AI inference speedups on large language models compared to the prior generation A100.
HBM3
- 2x bandwidth increase over the previous generation
- 3 TB/sec of memory bandwidth
50 MB L2 cache
Second-generation Multi-Instance GPU (MIG) technology
- 3x more compute capacity and nearly 2x more memory bandwidth per GPU Instance compared to A100
- Confidential Computing capability with MIG-level Trusted Execution Environments (TEE)
- Up to seven individual GPU Instances are supported, each with dedicated NVDEC and NVJPG units
Fourth-generation NVIDIA NVLink
- 900 GB/sec total bandwidth for multi-GPU IO operating
- 7x the bandwidth of PCIe Gen 5
Third-generation NVSwitch
- NVSwitches residing both inside and outside of nodes to connect multiple GPUs in servers, clusters, and data center environments
NVLink Switch System
- new second-level NVLink Switches based on third-gen NVSwitch technology
- up to 32 nodes or 256 GPUs to be connected over NVLink in a 2:1 tapered, fat tree topology
PCIe Gen 5
- 128 GB/sec total bandwidth (64 GB/sec in each direction)
- 64 GB/sec total bandwidth (32 GB/sec in each direction) in Gen 4 PCIe

Share on

Twitter Facebook Google+ LinkedIn

H100 white paper

Share on

You May Also Enjoy

Hpc network technologies

Nccl test on aks ndmv4 vm

Azhop backbone cost analyses

Azhop deployment with vnet peering