Daft.ai Achieves 100% GPU Utilization Embedding Millions of Texts with Qwen3; 3x Faster Method Discovered

Embedding Millions of Text Documents With Qwen3

We recently used Qwen3-Embedding-0.6B to embed millions of text documents while sustaining near-100% GPU utilization the whole way.That’s usually the gold standard that machine learning engineers aim for… but here’s the twist: in the time it took to write this blog post, we found a way to make the same workload 3× faster, and it didn’t involve maxing out GPU utilization at all. That story’s for another post, but first, here’s the recipe that got us to near-100%.The workloadHere at the Daft kitch...