State-of-the-Art Multiplatform Matrix Multiplication Kernels
Few algorithmic problems are as central to modern computing as matrix
multiplication. It is fundamental to AI, forming the basis of fully
connected layers used throughout neural networks. In transformer
architectures, most of the computation is spent performing matrix
multiplication. And since compute largely determines capability, faster
matrix multiplication algorithms directly translate into more powerful
models [1].
NVIDIA probably deserves much of the credit for making matrix
multiplication...
Read more at burn.dev