Voyage Code 2
Voyage Code 2 is Voyage AI's code-specialized embedding model with a context window of 0 tokens. It improves code retrieval by 14.52% over OpenAI text-embedding-3-large and supports Python, C++, Java, and major ML framework documentation.
import { embed } from 'ai';
const result = await embed({ model: 'voyage/voyage-code-2', value: 'Sunny day at the beach',})Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
More models by Voyage AI
| Model |
|---|
About Voyage Code 2
Voyage Code 2 is Voyage AI's code-specialized embedding model, released January 1, 2024. It features a context window of 0 tokens and targets code retrieval, code completion, and code assistant applications. On code retrieval tasks across 11 datasets derived from HumanEval, APPS, MBPP, DS-1000, CodeChef, and LeetCode, Voyage Code 2 achieves a 14.52% improvement in recall@5 over OpenAI text-embedding-3-large.
Voyage Code 2 also performs well on general-purpose text retrieval, exceeding OpenAI text-embedding-3-large by 3.03% and Cohere Embed v3 by 4.93%. You can use a single embedding model for both code and documentation retrieval rather than maintaining separate indices with different models.
Voyage AI evaluates it on Python, C++, and Java, plus documentation and usage patterns for Matplotlib, NumPy, Pandas, PyTorch, SciPy, scikit-learn, and TensorFlow. The model handles both natural language queries searching for code (text-to-code) and code snippets searching for similar code (code-to-code).
What To Consider When Choosing a Provider
- Configuration: Voyage Code 2 targets code search. If you're embedding source code, function signatures, and documentation for retrieval, it outperforms general-purpose embedding models by a wide margin.
- Configuration: Voyage AI released voyage-code-3, which supports 300+ programming languages, a 32K context window, and Matryoshka dimensionality. Use voyage-code-3 for new deployments unless you need compatibility with existing Voyage Code 2 indices.
- Configuration: Despite its code focus, Voyage Code 2 outperforms several general-purpose models on standard text retrieval. Use it for mixed code-and-documentation corpora without a second model.
- Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Voyage Code 2
Best For
- Code search engines: Retrieve relevant functions, classes, or modules from natural language queries
- Code completion pipelines: Retrieval-augmented generation finds similar code patterns
- Developer documentation search: API references, library docs, and code examples
- Mixed code and text retrieval: A single model handles both source code and natural language documentation
- ML framework documentation: Retrieval for Python-centric data science and machine learning workflows
Consider Alternatives When
- You need broader language coverage: Voyage-code-3 supports 300+ programming languages beyond Python, C++, and Java
- You need a longer context window: Voyage-code-3 offers 32K tokens versus Voyage Code 2's 0 tokens
- Your workload is general-purpose text with no code: A general-purpose embedding model like voyage-3.5 fits better
- You need Matryoshka dimensionality: Voyage-code-3 supports 2048/1024/512/256 dimensions for flexible sizing
Conclusion
Voyage Code 2 delivers a 14.52% code retrieval improvement over OpenAI text-embedding-3-large. If you have existing Voyage Code 2 indices, you can keep them and avoid a re-embed. For new deployments, use voyage-code-3 for its broader language coverage, longer context window, and Matryoshka dimensionality. Route requests through AI Gateway for unified access.
Frequently Asked Questions
What programming languages does Voyage Code 2 support?
Voyage Code 2 is evaluated on Python, C++, and Java, along with documentation for major ML libraries (NumPy, Pandas, PyTorch, scikit-learn, TensorFlow, Matplotlib, SciPy). It handles both text-to-code and code-to-code retrieval.
How does Voyage Code 2 compare to OpenAI text-embedding-3-large on code?
Voyage Code 2 achieves a 14.52% improvement in recall@5 over OpenAI text-embedding-3-large across 11 code retrieval datasets derived from HumanEval, APPS, MBPP, DS-1000, CodeChef, and LeetCode.
Should I use Voyage Code 2 or voyage-code-3?
For new deployments, voyage-code-3 is the recommended choice. It supports 300+ programming languages (vs. Python, C++, Java), a 32K context window (vs. 16K), and flexible Matryoshka dimensions. Use Voyage Code 2 if you have existing indices and want to avoid re-embedding.
Can Voyage Code 2 handle general text retrieval too?
Yes. Voyage Code 2 exceeds OpenAI text-embedding-3-large by 3.03% on general-purpose text retrieval, making it viable for mixed code-and-documentation corpora.
What is the context window for Voyage Code 2?
0 tokens. This handles most individual source files and documentation pages. For larger code contexts, use voyage-code-3 with its 32K-token window.
How do I authenticate Voyage Code 2 through Vercel AI Gateway?
Add your Voyage AI API key in AI Gateway settings, then send embedding requests through AI Gateway. AI Gateway authenticates requests across providers.
What retrieval tasks is Voyage Code 2 evaluated on?
11 code retrieval datasets totaling 43,909 query-document pairs, derived from HumanEval, APPS, MBPP, DS-1000, CodeChef, and LeetCode. The evaluation metric is
recall@5.