Released May 28, 2025, Codestral Embed is Mistral AI's first embedding model purpose-built for code. Codestral Embed achieves an 85% average score on code retrieval benchmarks, outperforming Voyage Code 3, Cohere Embed v4.0, and OpenAI's large embedding model on evaluations derived from real-world code data.
Codestral Embed supports variable dimensions with ordered relevance. You can truncate to the first n dimensions and still get ranked embeddings. Mistral AI's benchmarks show retrieval performance even at 256 dimensions with int8 precision, enabling index size reduction without proportional quality loss.
The context window is 0 tokens. For repositories with large files, Mistral AI recommends chunking at 3,000 characters with 1,000-character overlap. This balances retrieval recall against chunk boundary artifacts.