Voyage Code 3
Voyage Code 3 is Voyage AI's code-specialized embedding model with a context window of 0 tokens, 300+ programming language support, and Matryoshka dimensionality. It outperforms OpenAI text-embedding-3-large by 13.80% on code retrieval across 32 datasets.
import { embed } from 'ai';
const result = await embed({ model: 'voyage/voyage-code-3', value: 'Sunny day at the beach',})About Voyage Code 3
Voyage Code 3 is Voyage AI's code-specialized embedding model, released September 1, 2024. It supports a context window of 0 tokens and produces embeddings in four dimensions: 2048, 1024, 512, and 256. Voyage AI trained it on trillions of tokens combining text, code, and mathematical content plus real-world query-code pairs from GitHub repositories. It covers over 300 programming languages.
Across 32 code retrieval datasets, Voyage Code 3 outperforms OpenAI text-embedding-3-large by 13.80% and CodeSage-large by 16.81%. At 1024 dimensions, it retains 92.28% of its full-precision quality, compared to 77.64% for OpenAI at the same dimension. This makes dimension reduction particularly effective for cost or latency optimization.
Quantization-aware training supports 32-bit float, int8, uint8, binary, and unsigned binary formats. Binary embeddings at 256 dimensions still outperform OpenAI text-embedding-3-large by 4.81% while using 1/384th the storage of 3072-dimensional float embeddings. These compression options make Voyage Code 3 practical for very large codebases where millions of files need indexing.