Skip to content

Codestral Embed

Codestral Embed is Mistral AI's first embedding model specialized for code, outperforming general-purpose and competing code embedding models on real-world retrieval benchmarks.

index.ts
import { embed } from 'ai';
const result = await embed({
model: 'mistral/codestral-embed',
value: 'Sunny day at the beach',
})

About Codestral Embed

Released May 28, 2025, Codestral Embed is Mistral AI's first embedding model purpose-built for code. Codestral Embed achieves an 85% average score on code retrieval benchmarks, outperforming Voyage Code 3, Cohere Embed v4.0, and OpenAI's large embedding model on evaluations derived from real-world code data.

Codestral Embed supports variable dimensions with ordered relevance. You can truncate to the first n dimensions and still get ranked embeddings. Mistral AI's benchmarks show retrieval performance even at 256 dimensions with int8 precision, enabling index size reduction without proportional quality loss.

The context window is 0 tokens. For repositories with large files, Mistral AI recommends chunking at 3,000 characters with 1,000-character overlap. This balances retrieval recall against chunk boundary artifacts.