Skip to content

Voyage Code 2

Voyage Code 2 is Voyage AI's code-specialized embedding model with a context window of 0 tokens. It improves code retrieval by 14.52% over OpenAI text-embedding-3-large and supports Python, C++, Java, and major ML framework documentation.

index.ts
import { embed } from 'ai';
const result = await embed({
model: 'voyage/voyage-code-2',
value: 'Sunny day at the beach',
})

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Voyage AI
Legal:Terms
Privacy
$0.12/M
01/01/2024

More models by Voyage AI

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
32K
$0.05/M
voyage logo
08/11/2025
$0.18/M
voyage logo
09/01/2024
$0.12/M
voyage logo
03/01/2024
32K
$0.02/M
voyage logo
32K
$0.12/M
voyage logo
32K
$0.06/M
voyage logo

About Voyage Code 2

Voyage Code 2 is Voyage AI's code-specialized embedding model, released January 1, 2024. It features a context window of 0 tokens and targets code retrieval, code completion, and code assistant applications. On code retrieval tasks across 11 datasets derived from HumanEval, APPS, MBPP, DS-1000, CodeChef, and LeetCode, Voyage Code 2 achieves a 14.52% improvement in recall@5 over OpenAI text-embedding-3-large.

Voyage Code 2 also performs well on general-purpose text retrieval, exceeding OpenAI text-embedding-3-large by 3.03% and Cohere Embed v3 by 4.93%. You can use a single embedding model for both code and documentation retrieval rather than maintaining separate indices with different models.

Voyage AI evaluates it on Python, C++, and Java, plus documentation and usage patterns for Matplotlib, NumPy, Pandas, PyTorch, SciPy, scikit-learn, and TensorFlow. The model handles both natural language queries searching for code (text-to-code) and code snippets searching for similar code (code-to-code).

What To Consider When Choosing a Provider

  • Configuration: Voyage Code 2 targets code search. If you're embedding source code, function signatures, and documentation for retrieval, it outperforms general-purpose embedding models by a wide margin.
  • Configuration: Voyage AI released voyage-code-3, which supports 300+ programming languages, a 32K context window, and Matryoshka dimensionality. Use voyage-code-3 for new deployments unless you need compatibility with existing Voyage Code 2 indices.
  • Configuration: Despite its code focus, Voyage Code 2 outperforms several general-purpose models on standard text retrieval. Use it for mixed code-and-documentation corpora without a second model.
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Voyage Code 2

Best For

  • Code search engines: Retrieve relevant functions, classes, or modules from natural language queries
  • Code completion pipelines: Retrieval-augmented generation finds similar code patterns
  • Developer documentation search: API references, library docs, and code examples
  • Mixed code and text retrieval: A single model handles both source code and natural language documentation
  • ML framework documentation: Retrieval for Python-centric data science and machine learning workflows

Consider Alternatives When

  • You need broader language coverage: Voyage-code-3 supports 300+ programming languages beyond Python, C++, and Java
  • You need a longer context window: Voyage-code-3 offers 32K tokens versus Voyage Code 2's 0 tokens
  • Your workload is general-purpose text with no code: A general-purpose embedding model like voyage-3.5 fits better
  • You need Matryoshka dimensionality: Voyage-code-3 supports 2048/1024/512/256 dimensions for flexible sizing

Conclusion

Voyage Code 2 delivers a 14.52% code retrieval improvement over OpenAI text-embedding-3-large. If you have existing Voyage Code 2 indices, you can keep them and avoid a re-embed. For new deployments, use voyage-code-3 for its broader language coverage, longer context window, and Matryoshka dimensionality. Route requests through AI Gateway for unified access.

Frequently Asked Questions

  • What programming languages does Voyage Code 2 support?

    Voyage Code 2 is evaluated on Python, C++, and Java, along with documentation for major ML libraries (NumPy, Pandas, PyTorch, scikit-learn, TensorFlow, Matplotlib, SciPy). It handles both text-to-code and code-to-code retrieval.

  • How does Voyage Code 2 compare to OpenAI text-embedding-3-large on code?

    Voyage Code 2 achieves a 14.52% improvement in recall@5 over OpenAI text-embedding-3-large across 11 code retrieval datasets derived from HumanEval, APPS, MBPP, DS-1000, CodeChef, and LeetCode.

  • Should I use Voyage Code 2 or voyage-code-3?

    For new deployments, voyage-code-3 is the recommended choice. It supports 300+ programming languages (vs. Python, C++, Java), a 32K context window (vs. 16K), and flexible Matryoshka dimensions. Use Voyage Code 2 if you have existing indices and want to avoid re-embedding.

  • Can Voyage Code 2 handle general text retrieval too?

    Yes. Voyage Code 2 exceeds OpenAI text-embedding-3-large by 3.03% on general-purpose text retrieval, making it viable for mixed code-and-documentation corpora.

  • What is the context window for Voyage Code 2?

    0 tokens. This handles most individual source files and documentation pages. For larger code contexts, use voyage-code-3 with its 32K-token window.

  • How do I authenticate Voyage Code 2 through Vercel AI Gateway?

    Add your Voyage AI API key in AI Gateway settings, then send embedding requests through AI Gateway. AI Gateway authenticates requests across providers.

  • What retrieval tasks is Voyage Code 2 evaluated on?

    11 code retrieval datasets totaling 43,909 query-document pairs, derived from HumanEval, APPS, MBPP, DS-1000, CodeChef, and LeetCode. The evaluation metric is recall@5.