Voyage Code 3
Voyage Code 3 is Voyage AI's code-specialized embedding model with a context window of 0 tokens, 300+ programming language support, and Matryoshka dimensionality. It outperforms OpenAI text-embedding-3-large by 13.80% on code retrieval across 32 datasets.
import { embed } from 'ai';
const result = await embed({ model: 'voyage/voyage-code-3', value: 'Sunny day at the beach',})Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
More models by Voyage AI
| Model |
|---|
About Voyage Code 3
Voyage Code 3 is Voyage AI's code-specialized embedding model, released September 1, 2024. It supports a context window of 0 tokens and produces embeddings in four dimensions: 2048, 1024, 512, and 256. Voyage AI trained it on trillions of tokens combining text, code, and mathematical content plus real-world query-code pairs from GitHub repositories. It covers over 300 programming languages.
Across 32 code retrieval datasets, Voyage Code 3 outperforms OpenAI text-embedding-3-large by 13.80% and CodeSage-large by 16.81%. At 1024 dimensions, it retains 92.28% of its full-precision quality, compared to 77.64% for OpenAI at the same dimension. This makes dimension reduction particularly effective for cost or latency optimization.
Quantization-aware training supports 32-bit float, int8, uint8, binary, and unsigned binary formats. Binary embeddings at 256 dimensions still outperform OpenAI text-embedding-3-large by 4.81% while using 1/384th the storage of 3072-dimensional float embeddings. These compression options make Voyage Code 3 practical for very large codebases where millions of files need indexing.
What To Consider When Choosing a Provider
- Configuration: Voyage Code 3 covers 300+ programming languages, the widest language coverage in Voyage AI's code embedding lineup. If your codebase spans multiple languages or includes less common languages, this breadth eliminates the need for language-specific models.
- Configuration: At 1024 dimensions, Voyage Code 3 retains 92.28% quality versus 77.64% for OpenAI. This makes dimension reduction particularly effective for Voyage Code 3, enabling large-scale code search indices at lower storage cost.
- Configuration: Voyage Code 3 doubles the context window (32K vs. 16K), adds Matryoshka dimensionality, and expands language coverage from a handful of languages to 300+. For new code search deployments, Voyage Code 3 is the recommended option.
- Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Voyage Code 3
Best For
- Enterprise code search: Polyglot repositories with hundreds of programming languages
- RAG for code generation: Retrieving relevant code examples, patterns, and documentation improves generated output quality
- Large-scale code indexing: Binary or int8 embeddings at 256-1024 dimensions keep storage costs manageable across millions of files
- Text-to-code retrieval: Natural language queries surface relevant functions, classes, and modules
- Code-to-code similarity: Detecting duplicates, finding related implementations, and recommending refactoring targets
- Developer tools and IDE integrations: Fast, accurate code search served as a backend service
Consider Alternatives When
- Your retrieval corpus is primarily natural language: Voyage-3.5 or voyage-3-large are better general-purpose choices when there's little or no code
- Your workload is exclusively in one domain like legal or finance: Domain-specific models may provide marginal accuracy gains
- You need multimodal embeddings: Pick a model with native image inputs for screenshots and diagrams
- Maximum cost efficiency is required: Voyage-3.5-lite handles code as one of its eight evaluated domains when accuracy requirements are moderate
Conclusion
Voyage Code 3 achieves a 13.80% improvement in code retrieval quality over OpenAI text-embedding-3-large across 32 datasets and covers 300+ programming languages. Its Matryoshka dimensionality and quantization options make it practical for indexing large codebases at scale. Route requests through AI Gateway for unified access, usage tracking, and the flexibility to switch providers without changing your integration.
Frequently Asked Questions
How many programming languages does Voyage Code 3 support?
Over 300 programming languages. That is the widest language coverage in Voyage AI's code embedding lineup, suitable for polyglot repositories and enterprise codebases.
How does Voyage Code 3 compare to voyage-code-2?
Voyage Code 3 doubles the context window (32K vs. 16K), adds Matryoshka dimensionality (2048/1024/512/256), expands language coverage from a few languages to 300+, and adds quantization-aware training. It is the recommended choice for new code search deployments.
What is the storage savings with binary embeddings?
Binary embeddings at 256 dimensions use 1/384th the storage of 3072-dimensional float embeddings while still outperforming OpenAI text-embedding-3-large by 4.81%. This makes it feasible to index very large codebases efficiently.
How does dimension reduction quality compare to OpenAI?
At 1024 dimensions, Voyage Code 3 retains 92.28% of its full-precision quality, compared to 77.64% for OpenAI text-embedding-3-large. This means dimension reduction is significantly more effective with Voyage Code 3.
What training data does Voyage Code 3 use?
Voyage AI trained Voyage Code 3 on trillions of tokens combining text, code, and mathematical content, supplemented with real-world query-code pairs from GitHub repositories for practical retrieval relevance.
How do I access Voyage Code 3 through Vercel AI Gateway?
Add your Voyage AI API key in AI Gateway settings, then send embedding requests through AI Gateway. You can point the same client code at different provider models; AI Gateway authenticates requests and records usage.
Can Voyage Code 3 handle text-to-code and code-to-code retrieval?
Yes. Voyage Code 3 is evaluated on both text-to-code retrieval (natural language queries finding relevant code) and code-to-code retrieval (finding similar implementations). Both are core strengths of the model.