Skip to content
Vercel April 2026 security incident

Voyage Code 3

voyage/voyage-code-3

Voyage Code 3 is Voyage AI's code-specialized embedding model with a context window of 0 tokens, 300+ programming language support, and Matryoshka dimensionality. It outperforms OpenAI text-embedding-3-large by 13.80% on code retrieval across 32 datasets.

index.ts
import { embed } from 'ai';
const result = await embed({
model: 'voyage/voyage-code-3',
value: 'Sunny day at the beach',
})

What To Consider When Choosing a Provider

  • Zero Data Retention

    AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.

    Authentication

    AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

Voyage Code 3 covers 300+ programming languages, the widest language coverage in Voyage AI's code embedding lineup. If your codebase spans multiple languages or includes less common languages, this breadth eliminates the need for language-specific models.

At 1024 dimensions, Voyage Code 3 retains 92.28% quality versus 77.64% for OpenAI. This makes dimension reduction particularly effective for Voyage Code 3, enabling large-scale code search indices at lower storage cost.

Voyage Code 3 doubles the context window (32K vs. 16K), adds Matryoshka dimensionality, and expands language coverage from a handful of languages to 300+. For new code search deployments, Voyage Code 3 is the recommended option.

When to Use Voyage Code 3

Best For

  • Enterprise code search:

    Polyglot repositories with hundreds of programming languages

  • RAG for code generation:

    Retrieving relevant code examples, patterns, and documentation improves generated output quality

  • Large-scale code indexing:

    Binary or int8 embeddings at 256-1024 dimensions keep storage costs manageable across millions of files

  • Text-to-code retrieval:

    Natural language queries surface relevant functions, classes, and modules

  • Code-to-code similarity:

    Detecting duplicates, finding related implementations, and recommending refactoring targets

  • Developer tools and IDE integrations:

    Fast, accurate code search served as a backend service

Consider Alternatives When

  • Your retrieval corpus is primarily natural language:

    Voyage-3.5 or voyage-3-large are better general-purpose choices when there's little or no code

  • Your workload is exclusively in one domain like legal or finance:

    Domain-specific models may provide marginal accuracy gains

  • You need multimodal embeddings:

    Pick a model with native image inputs for screenshots and diagrams

  • Maximum cost efficiency is required:

    Voyage-3.5-lite handles code as one of its eight evaluated domains when accuracy requirements are moderate

Conclusion

Voyage Code 3 achieves a 13.80% improvement in code retrieval quality over OpenAI text-embedding-3-large across 32 datasets and covers 300+ programming languages. Its Matryoshka dimensionality and quantization options make it practical for indexing large codebases at scale. Route requests through AI Gateway for unified access, usage tracking, and the flexibility to switch providers without changing your integration.

FAQ

Over 300 programming languages. That is the widest language coverage in Voyage AI's code embedding lineup, suitable for polyglot repositories and enterprise codebases.

Voyage Code 3 doubles the context window (32K vs. 16K), adds Matryoshka dimensionality (2048/1024/512/256), expands language coverage from a few languages to 300+, and adds quantization-aware training. It is the recommended choice for new code search deployments.

Binary embeddings at 256 dimensions use 1/384th the storage of 3072-dimensional float embeddings while still outperforming OpenAI text-embedding-3-large by 4.81%. This makes it feasible to index very large codebases efficiently.

At 1024 dimensions, Voyage Code 3 retains 92.28% of its full-precision quality, compared to 77.64% for OpenAI text-embedding-3-large. This means dimension reduction is significantly more effective with Voyage Code 3.

Voyage AI trained Voyage Code 3 on trillions of tokens combining text, code, and mathematical content, supplemented with real-world query-code pairs from GitHub repositories for practical retrieval relevance.

Add your Voyage AI API key in AI Gateway settings, then send embedding requests through AI Gateway. You can point the same client code at different provider models; AI Gateway authenticates requests and records usage.

Yes. Voyage Code 3 is evaluated on both text-to-code retrieval (natural language queries finding relevant code) and code-to-code retrieval (finding similar implementations). Both are core strengths of the model.