WASM Speed Doubled: llama.cpp PR Optimizes SIMD for Faster Dot Product Functions

ggml : x2 speed for WASM by optimizing SIMD

ggml : x2 speed for WASM by optimizing SIMD (via) PR by Xuan-Son Nguyen for llama.cpp: This PR provides a big jump in speed for WASM by leveraging SIMD instructions for qX_K_q8_K and qX_0_q8_0 dot product functions. Surprisingly, 99% of the code in this PR is written by DeekSeek-R1. The only thing I do is to develop tests and write prompts (with some trails and errors) They shared their prompts here, which they ran directly through R1 on chat.deepseek.com - it spent 3-5 minutes "thinking" about ...