Voyage Code 3 is Voyage AI's code-specialized embedding model, released September 1, 2024. It supports a context window of 0 tokens and produces embeddings in four dimensions: 2048, 1024, 512, and 256. Voyage AI trained it on trillions of tokens combining text, code, and mathematical content plus real-world query-code pairs from GitHub repositories. It covers over 300 programming languages.
Across 32 code retrieval datasets, Voyage Code 3 outperforms OpenAI text-embedding-3-large by 13.80% and CodeSage-large by 16.81%. At 1024 dimensions, it retains 92.28% of its full-precision quality, compared to 77.64% for OpenAI at the same dimension. This makes dimension reduction particularly effective for cost or latency optimization.
Quantization-aware training supports 32-bit float, int8, uint8, binary, and unsigned binary formats. Binary embeddings at 256 dimensions still outperform OpenAI text-embedding-3-large by 4.81% while using 1/384th the storage of 3072-dimensional float embeddings. These compression options make Voyage Code 3 practical for very large codebases where millions of files need indexing.