Voyage Code 2
Voyage Code 2 is Voyage AI's code-specialized embedding model with a context window of 0 tokens. It improves code retrieval by 14.52% over OpenAI text-embedding-3-large and supports Python, C++, Java, and major ML framework documentation.
import { embed } from 'ai';
const result = await embed({ model: 'voyage/voyage-code-2', value: 'Sunny day at the beach',})Frequently Asked Questions
What programming languages does Voyage Code 2 support?
Voyage Code 2 is evaluated on Python, C++, and Java, along with documentation for major ML libraries (NumPy, Pandas, PyTorch, scikit-learn, TensorFlow, Matplotlib, SciPy). It handles both text-to-code and code-to-code retrieval.
How does Voyage Code 2 compare to OpenAI text-embedding-3-large on code?
Voyage Code 2 achieves a 14.52% improvement in recall@5 over OpenAI text-embedding-3-large across 11 code retrieval datasets derived from HumanEval, APPS, MBPP, DS-1000, CodeChef, and LeetCode.
Should I use Voyage Code 2 or voyage-code-3?
For new deployments, voyage-code-3 is the recommended choice. It supports 300+ programming languages (vs. Python, C++, Java), a 32K context window (vs. 16K), and flexible Matryoshka dimensions. Use Voyage Code 2 if you have existing indices and want to avoid re-embedding.
Can Voyage Code 2 handle general text retrieval too?
Yes. Voyage Code 2 exceeds OpenAI text-embedding-3-large by 3.03% on general-purpose text retrieval, making it viable for mixed code-and-documentation corpora.
What is the context window for Voyage Code 2?
0 tokens. This handles most individual source files and documentation pages. For larger code contexts, use voyage-code-3 with its 32K-token window.
How do I authenticate Voyage Code 2 through Vercel AI Gateway?
Add your Voyage AI API key in AI Gateway settings, then send embedding requests through AI Gateway. AI Gateway authenticates requests across providers.
What retrieval tasks is Voyage Code 2 evaluated on?
11 code retrieval datasets totaling 43,909 query-document pairs, derived from HumanEval, APPS, MBPP, DS-1000, CodeChef, and LeetCode. The evaluation metric is
recall@5.