Optimizing a WebGPU Matmul Kernel for 1TFLOP+ Performance
I work at Nomic, where many of my colleagues work on building large TSNE-like visualizations work in the browser1. Showing tens of millions of data points in the browser without rendering your computer an oven is no easy challenge. I overhear many of the scaling problems solved by Deepscatter, first developed by Ben Schmidt. However, many conversations that I overhear tend to revolve around Typescript and how awesome WebGPU is. At the time of writing, I couldn’t find any autograd libraries buil...
Read more at zanussbaum.substack.com