Voyage Law 2 is Voyage AI's legal-specialized embedding model, released March 1, 2024. Voyage AI trained it on one trillion high-quality legal tokens using specifically designed positive pairs and a contrastive learning algorithm. The model handles diverse legal content including contracts, congressional bills, court cases, and statutes across multiple jurisdictions: U.S., Chinese, German, and Indian.
Across eight legal retrieval datasets, Voyage Law 2 outperforms OpenAI text-embedding-3-large by 6% on average, with improvements exceeding 10% on LeCaRDv2, LegalQuAD, and GerDaLIR. On long-context legal retrieval, Voyage Law 2 achieves 84.44 NDCG@10 compared to 68.40 for OpenAI. That's a 23% relative improvement reflecting the model's strength on lengthy legal documents.
Voyage AI intentionally mixed legal training data with finance, technology, and intellectual property domains. This ensures Voyage Law 2 performs well on non-legal retrieval tasks while maintaining its legal specialization. Teams with mixed legal and business content don't need a separate general-purpose model for non-legal documents in the same index.