Voyage Law 2
Voyage Law 2 is Voyage AI's legal-specialized embedding model trained on one trillion legal tokens. It outperforms OpenAI text-embedding-3-large by 6% across eight legal datasets and achieves 84.44 NDCG@10 on long-context legal retrieval versus 68.40 for OpenAI.
import { embed } from 'ai';
const result = await embed({ model: 'voyage/voyage-law-2', value: 'Sunny day at the beach',})Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
More models by Voyage AI
| Model |
|---|
About Voyage Law 2
Voyage Law 2 is Voyage AI's legal-specialized embedding model, released March 1, 2024. Voyage AI trained it on one trillion high-quality legal tokens using specifically designed positive pairs and a contrastive learning algorithm. The model handles diverse legal content including contracts, congressional bills, court cases, and statutes across multiple jurisdictions: U.S., Chinese, German, and Indian.
Across eight legal retrieval datasets, Voyage Law 2 outperforms OpenAI text-embedding-3-large by 6% on average, with improvements exceeding 10% on LeCaRDv2, LegalQuAD, and GerDaLIR. On long-context legal retrieval, Voyage Law 2 achieves 84.44 NDCG@10 compared to 68.40 for OpenAI. That's a 23% relative improvement reflecting the model's strength on lengthy legal documents.
Voyage AI intentionally mixed legal training data with finance, technology, and intellectual property domains. This ensures Voyage Law 2 performs well on non-legal retrieval tasks while maintaining its legal specialization. Teams with mixed legal and business content don't need a separate general-purpose model for non-legal documents in the same index.
What To Consider When Choosing a Provider
- Configuration: Voyage Law 2 excels on long-context legal retrieval, achieving 84.44 NDCG@10 versus 68.40 for OpenAI. If your legal corpus contains lengthy contracts, court opinions, or statutory texts, this is where Voyage Law 2 provides the largest accuracy gains.
- Configuration: Voyage AI trains and evaluates Voyage Law 2 on U.S., Chinese, German, and Indian legal content. If your practice spans these jurisdictions, Voyage Law 2 handles cross-jurisdictional retrieval within a single index.
- Configuration: Voyage AI's voyage-3-large now outperforms domain-specific models on legal benchmarks. For new deployments with mixed legal and non-legal content, evaluate voyage-3-large or voyage-3.5 as alternatives.
- Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Voyage Law 2
Best For
- Legal research and discovery: Contracts, case law, and statutes where domain-specific terminology and document structure matter
- Long-document legal retrieval: Contracts and court opinions span many pages and key passages appear deep in the text
- Multi-jurisdictional legal search: U.S., Chinese, German, and Indian legal content within a single index
- RAG for legal applications: Retrieving precise statutory language, case citations, and contractual clauses improves generated output
- Compliance and regulatory retrieval: Searches across regulations, guidance, and enforcement actions from multiple legal systems
Consider Alternatives When
- Your content spans multiple domains beyond legal: Voyage-3.5 or voyage-3-large provides cross-domain retrieval that includes legal
- You want Matryoshka dimensionality and quantization: Voyage-3.5 offers these while covering legal content
- Your legal documents are primarily short-form: Clauses and headnotes benefit less from long-context strength
- You need code or financial retrieval alongside legal content: A general-purpose model avoids managing multiple specialized indices
Conclusion
Voyage Law 2 provides measurable accuracy gains for legal document retrieval, particularly on long-context tasks where it outperforms OpenAI by 23% relative. Its training on one trillion legal tokens across multiple jurisdictions makes it well suited for legal research, discovery, and compliance workflows. Access it through AI Gateway for unified provider management and the flexibility to combine it with other embedding models as your needs evolve.
Frequently Asked Questions
What legal document types does Voyage Law 2 handle?
Contracts, congressional bills, court cases, and statutes. It is trained and evaluated across U.S., Chinese, German, and Indian legal content.
How does Voyage Law 2 perform on long legal documents?
Voyage Law 2 achieves 84.44 NDCG@10 on long-context legal retrieval, compared to 68.40 for OpenAI text-embedding-3-large. This 23% relative improvement reflects its strength on the lengthy documents common in legal work.
Should I use Voyage Law 2 or voyage-3-large for legal retrieval?
Voyage AI's voyage-3-large now outperforms domain-specific models on legal benchmarks. For mixed legal and non-legal content, voyage-3-large or voyage-3.5 may be simpler. Use Voyage Law 2 if your workload is exclusively legal and you have existing indices.
Does Voyage Law 2 work for non-legal content?
Yes. Voyage AI intentionally trained Voyage Law 2 on mixed legal, financial, technical, and intellectual property content. It performs well on non-legal retrieval tasks while maintaining its legal specialization.
What jurisdictions does Voyage Law 2 cover?
U.S., Chinese, German, and Indian legal systems. The model handles cross-jurisdictional retrieval within a single vector index.
How do I access Voyage Law 2 through Vercel AI Gateway?
Add your Voyage AI API key in AI Gateway settings, then send embedding requests through AI Gateway. AI Gateway authenticates requests and records usage. You can combine Voyage Law 2 with other embedding models across providers.
How much legal training data was used for Voyage Law 2?
One trillion high-quality legal tokens, with specifically designed positive pairs and a contrastive learning algorithm. The training data spans contracts, legislation, and case law across multiple jurisdictions.