Voyage Code 2
Voyage Code 2 is Voyage AI's code-specialized embedding model with a context window of 0 tokens. It improves code retrieval by 14.52% over OpenAI text-embedding-3-large and supports Python, C++, Java, and major ML framework documentation.
import { embed } from 'ai';
const result = await embed({ model: 'voyage/voyage-code-2', value: 'Sunny day at the beach',})Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
More models by Voyage AI
| Model |
|---|
About Voyage Code 2
Voyage Code 2 is Voyage AI's code-specialized embedding model, released January 1, 2024. It features a context window of 0 tokens and targets code retrieval, code completion, and code assistant applications. On code retrieval tasks across 11 datasets derived from HumanEval, APPS, MBPP, DS-1000, CodeChef, and LeetCode, Voyage Code 2 achieves a 14.52% improvement in recall@5 over OpenAI text-embedding-3-large.
Voyage Code 2 also performs well on general-purpose text retrieval, exceeding OpenAI text-embedding-3-large by 3.03% and Cohere Embed v3 by 4.93%. You can use a single embedding model for both code and documentation retrieval rather than maintaining separate indices with different models.
Voyage AI evaluates it on Python, C++, and Java, plus documentation and usage patterns for Matplotlib, NumPy, Pandas, PyTorch, SciPy, scikit-learn, and TensorFlow. The model handles both natural language queries searching for code (text-to-code) and code snippets searching for similar code (code-to-code).
What To Consider When Choosing a Provider
- Configuration: Voyage Code 2 targets code search. If you're embedding source code, function signatures, and documentation for retrieval, it outperforms general-purpose embedding models by a wide margin.
- Configuration: Voyage AI released voyage-code-3, which supports 300+ programming languages, a 32K context window, and Matryoshka dimensionality. Use voyage-code-3 for new deployments unless you need compatibility with existing Voyage Code 2 indices.
- Configuration: Despite its code focus, Voyage Code 2 outperforms several general-purpose models on standard text retrieval. Use it for mixed code-and-documentation corpora without a second model.
- Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Voyage Code 2
Best For
- Code search engines: Retrieve relevant functions, classes, or modules from natural language queries
- Code completion pipelines: Retrieval-augmented generation finds similar code patterns
- Developer documentation search: API references, library docs, and code examples
- Mixed code and text retrieval: A single model handles both source code and natural language documentation
- ML framework documentation: Retrieval for Python-centric data science and machine learning workflows
Consider Alternatives When
- You need broader language coverage: Voyage-code-3 supports 300+ programming languages beyond Python, C++, and Java
- You need a longer context window: Voyage-code-3 offers 32K tokens versus Voyage Code 2's 0 tokens
- Your workload is general-purpose text with no code: A general-purpose embedding model like voyage-3.5 fits better
- You need Matryoshka dimensionality: Voyage-code-3 supports 2048/1024/512/256 dimensions for flexible sizing
Conclusion
Voyage Code 2 delivers a 14.52% code retrieval improvement over OpenAI text-embedding-3-large. If you have existing Voyage Code 2 indices, you can keep them and avoid a re-embed. For new deployments, use voyage-code-3 for its broader language coverage, longer context window, and Matryoshka dimensionality. Route requests through AI Gateway for unified access.