Skip to content

Codestral Embed

Codestral Embed is Mistral AI's first embedding model specialized for code, outperforming general-purpose and competing code embedding models on real-world retrieval benchmarks.

index.ts
import { embed } from 'ai';
const result = await embed({
model: 'mistral/codestral-embed',
value: 'Sunny day at the beach',
})

Frequently Asked Questions

  • What makes Codestral Embed different from a general text embedding model?

    Codestral Embed was trained on real-world code data and optimized for code retrieval tasks. These tasks involve matching function signatures, logic patterns, and structural similarities that general text models don't capture well.

  • What is the context window for Codestral Embed?

    0 tokens. For files larger than this, chunk at 3,000 characters with 1,000-character overlap.

  • What embedding dimensions does Codestral Embed support?

    Variable dimensions with ordered relevance: you can keep the first n dimensions for a quality-versus-cost tradeoff. Codestral Embed scores well on retrieval at 256 dimensions with int8 precision in published benchmarks.

  • What is the pricing for Codestral Embed?

    Check the pricing panel on this page for today's numbers. AI Gateway tracks rates across every provider that serves Codestral Embed.

  • Can Codestral Embed be used for duplicate code detection?

    Yes. Semantic similarity via embeddings is an effective approach for identifying duplicate or near-duplicate code patterns that differ syntactically but are logically equivalent.

  • How does Codestral Embed compare to Mistral AI Embed?

    Mistral AI Embed is a general-purpose text embedding model. Codestral Embed was trained specifically for code and outperforms general models on code retrieval benchmarks. Use Codestral Embed when your corpus is source code; use Mistral AI Embed for natural language documents.