Skip to content

Voyage Code 2

Voyage Code 2 is Voyage AI's code-specialized embedding model with a context window of 0 tokens. It improves code retrieval by 14.52% over OpenAI text-embedding-3-large and supports Python, C++, Java, and major ML framework documentation.

index.ts
import { embed } from 'ai';
const result = await embed({
model: 'voyage/voyage-code-2',
value: 'Sunny day at the beach',
})

Frequently Asked Questions

  • What programming languages does Voyage Code 2 support?

    Voyage Code 2 is evaluated on Python, C++, and Java, along with documentation for major ML libraries (NumPy, Pandas, PyTorch, scikit-learn, TensorFlow, Matplotlib, SciPy). It handles both text-to-code and code-to-code retrieval.

  • How does Voyage Code 2 compare to OpenAI text-embedding-3-large on code?

    Voyage Code 2 achieves a 14.52% improvement in recall@5 over OpenAI text-embedding-3-large across 11 code retrieval datasets derived from HumanEval, APPS, MBPP, DS-1000, CodeChef, and LeetCode.

  • Should I use Voyage Code 2 or voyage-code-3?

    For new deployments, voyage-code-3 is the recommended choice. It supports 300+ programming languages (vs. Python, C++, Java), a 32K context window (vs. 16K), and flexible Matryoshka dimensions. Use Voyage Code 2 if you have existing indices and want to avoid re-embedding.

  • Can Voyage Code 2 handle general text retrieval too?

    Yes. Voyage Code 2 exceeds OpenAI text-embedding-3-large by 3.03% on general-purpose text retrieval, making it viable for mixed code-and-documentation corpora.

  • What is the context window for Voyage Code 2?

    0 tokens. This handles most individual source files and documentation pages. For larger code contexts, use voyage-code-3 with its 32K-token window.

  • How do I authenticate Voyage Code 2 through Vercel AI Gateway?

    Add your Voyage AI API key in AI Gateway settings, then send embedding requests through AI Gateway. AI Gateway authenticates requests across providers.

  • What retrieval tasks is Voyage Code 2 evaluated on?

    11 code retrieval datasets totaling 43,909 query-document pairs, derived from HumanEval, APPS, MBPP, DS-1000, CodeChef, and LeetCode. The evaluation metric is recall@5.