Embedding Millions of Text Documents With Qwen3
We recently used Qwen3-Embedding-0.6B to embed millions of text documents while sustaining near-100% GPU utilization the whole way.That’s usually the gold standard that machine learning engineers aim for… but here’s the twist: in the time it took to write this blog post, we found a way to make the same workload 3× faster, and it didn’t involve maxing out GPU utilization at all. That story’s for another post, but first, here’s the recipe that got us to near-100%.The workloadHere at the Daft kitch...
Read more at daft.ai