Voyage-code-3: New Embedding Model Outperforms Rivals, Offers Lower Dimensions and Quantization for Efficient Code Retrieval

voyage-code-3: more accurate code retrieval with lower dimensional, quantized embeddings

TL;DR – Introducing voyage-code-3, our next-generation embedding model optimized for code retrieval. It outperforms OpenAI-v3-large and CodeSage-large by an average of 13.80% and 16.81% on a suite of 32 code retrieval datasets, respectively. By supporting smaller dimensions with Matryoshka learning and quantized formats like int8 and binary, voyage-code-3 can also dramatically reduce storage and search costs with minimal impact on retrieval quality. Since its launch in Jan, voyage-code-2 has bee...