Performance optimization, and how to do it wrong
I recently tried to optimize convolutions using
SIMD
instructions, but what I thought would be a simple task ended up taking me days,
with issue after issue popping up one after another. Some of them make sense in
hindsight, but others were utterly baffling. While the specific examples are for
direct convolution, these considerations apply to pretty much any code with a
hot loop.
Background
I work on burn and recently wanted to
optimize direct convolution on the burn-ndarray CPU backend.
For con...
Read more at genna.win