"Efficiency of Modern Neural Network Training Turbocharged with Lower Precision Data Types on A100 GPUs: PyTorch Leverages Mixed Precision to Boost Performance, Reduce Memory Usage"

What Every User Should Know About Mixed Precision Training in PyTorch

by Syed Ahmed, Christian Sarofeen, Mike Ruberry, Eddie Yan, Natalia Gimelshein, Michael Carilli, Szymon Migacz, Piotr Bialecki, Paulius Micikevicius, Dusan Stosic, Dong Yang, and Naoya Maruyama Efficient training of modern neural networks often relies on using lower precision data types. Peak float16 matrix multiplication and convolution performance is 16x faster than peak float32 performance on A100 GPUs. And since the float16 and bfloat16 data types are only half the size of float32 they can d...