Voyage Code 2 is Voyage AI's code-specialized embedding model, released January 1, 2024. It features a context window of 0 tokens and targets code retrieval, code completion, and code assistant applications. On code retrieval tasks across 11 datasets derived from HumanEval, APPS, MBPP, DS-1000, CodeChef, and LeetCode, Voyage Code 2 achieves a 14.52% improvement in recall@5 over OpenAI text-embedding-3-large.
Voyage Code 2 also performs well on general-purpose text retrieval, exceeding OpenAI text-embedding-3-large by 3.03% and Cohere Embed v3 by 4.93%. You can use a single embedding model for both code and documentation retrieval rather than maintaining separate indices with different models.
Voyage AI evaluates it on Python, C++, and Java, plus documentation and usage patterns for Matplotlib, NumPy, Pandas, PyTorch, SciPy, scikit-learn, and TensorFlow. The model handles both natural language queries searching for code (text-to-code) and code snippets searching for similar code (code-to-code).