News Score: Score the News, Sort the News, Rewrite the Headlines

Beating NumPy’s matrix multiplication in 150 lines of C code

TL;DR The code from the tutorial is available at matmul.c. This blog post is the result of my attempt to implement high-performance matrix multiplication on CPU while keeping the code simple, portable and scalable. The implementation follows the BLIS design, works for arbitrary matrix sizes, and, when fine-tuned for an AMD Ryzen 7700 (8 cores), outperforms NumPy (=OpenBLAS), achieving over 1 TFLOPS of peak performance across a wide range of matrix sizes. By efficiently parallelizing the code wit...

Read more at salykova.github.io

© News Score  score the news, sort the news, rewrite the headlines