H100 white paper
H100 is NVIDIA’s 9th-generation data center GPU. For today’s mainstream AI and HPC models, H100 with InfiniBand interconnect delivers up to 30 times the performance of A100.
Specs | H100 | A100 (80GB) | V100 |
---|---|---|---|
Transistor Count | 80B | 54.2B | 21.1B |
TDP | 700W | 400W | 300/350W |
Manufacturing Process | TSMC 4N | TSMC 7N | TSMC 12nm FFN |
Form Factor | SXM5 | SXM4 | SXM2/SXM3 |
Architecture | Hopper | Ampere | Volta |
FP32 CUDA Cores | 16896 | 6912 | 5120 |
Tensor Cores | 528 | 432 | 640 |
Boost Clock (GHz) | 1.78 | 1.41 | 1.53 |
Memory Clock (Gbps) | 4.8 HBM3 | 3.2 HBM2e | 1.75 HBM2 |
Memory Bus Width | 5120-bit | 5120-bit | 4096-bit |
Memory Bandwidth (TB/s) | 3 | 2 | 0.9 |
GPU Memory Capacity (GB) | 80 | 80 | 16/32 |
FP32 Vector | 60 TFLOPS | 19.5 TFLOPS | 15.7 TFLOPS |
FP64 Vector | 30 TFLOPS | 9.7 TFLOPS | 7.8 TFLOPS |
INT8 Tensor | 2000 TOPS | 624 TOPS | NA |
FP16 Tensor | 1000 TFLOPS | 312 TFLOPS | 125 TFLOPS |
TF32 Tensor | 500 TFLOPS | 156 TFLOPS | NA |
FP64 Tensor | 60 TFLOPS | 19.5 TFLOPS | NA |
Interconnect | NVLink4 18 Links (900 GB/s) | NVLink3 12 Links (600 GB/s) | NVLink2 6 Links (300 GB/s) |
- Fourth-generation Tensor Cores
- 6x faster chip-to-chip compared to A100, including per-SM speedup, additional SM count, and higher clocks
- 3x faster IEEE FP64 and FP32 processing rates chip-to-chip compared to A100
- New Thread Block Cluster feature, adding another level to the programming hierarchy to now include Threads, Thread Blocks, Thread Block Clusters, and Grids.
- Transformer Engine
- a combination of software and custom Hopper Tensor Core technology
- Transformer Engine intelligently manages and dynamically chooses between FP8 and 16-bit calculations
- 9x faster AI training and up to 30x faster AI inference speedups on large language models compared to the prior generation A100.
- HBM3
- 2x bandwidth increase over the previous generation
- 3 TB/sec of memory bandwidth
- 50 MB L2 cache
- Second-generation Multi-Instance GPU (MIG) technology
- 3x more compute capacity and nearly 2x more memory bandwidth per GPU Instance compared to A100
- Confidential Computing capability with MIG-level Trusted Execution Environments (TEE)
- Up to seven individual GPU Instances are supported, each with dedicated NVDEC and NVJPG units
- Fourth-generation NVIDIA NVLink
- 900 GB/sec total bandwidth for multi-GPU IO operating
- 7x the bandwidth of PCIe Gen 5
- Third-generation NVSwitch
- NVSwitches residing both inside and outside of nodes to connect multiple GPUs in servers, clusters, and data center environments
- NVLink Switch System
- new second-level NVLink Switches based on third-gen NVSwitch technology
- up to 32 nodes or 256 GPUs to be connected over NVLink in a 2:1 tapered, fat tree topology
- PCIe Gen 5
- 128 GB/sec total bandwidth (64 GB/sec in each direction)
- 64 GB/sec total bandwidth (32 GB/sec in each direction) in Gen 4 PCIe