Skip to content

Text Multilingual Embedding 002

Text Multilingual Embedding 002 is an 18-language text embedding model achieving a 56.2% average score on the Massive Information Retrieval Across Languages (MIRACL) benchmark, designed for cross-lingual semantic search and retrieval across diverse language corpora.

index.ts
import { embed } from 'ai';
const result = await embed({
model: 'google/text-multilingual-embedding-002',
value: 'Sunny day at the beach',
})

Frequently Asked Questions

  • How many languages does text-multilingual-embedding-002 support?

    The model is evaluated on MIRACL, which covers 18 languages. Text Multilingual Embedding 002 scores 56.2% on average on this benchmark. Consult the Vertex AI documentation for the complete list of supported languages.

  • What is MIRACL and how does it differ from MTEB?

    MIRACL (Massive Information Retrieval Across Languages) is a multilingual retrieval benchmark covering 18 languages, used to evaluate cross-lingual information retrieval quality. MTEB is an English-language benchmark covering eight task categories. The two models in this family are each evaluated on the benchmark most relevant to their design target.

  • Can users query in one language and retrieve results in another?

    Yes. This is the key capability of a shared multilingual embedding space. Text from all supported languages is mapped into the same vector space, so a query in Japanese and a matching document in Arabic will have similar vector representations, enabling cross-lingual retrieval without query translation.

  • Does this model support dynamic embedding sizes?

    Yes. Like text-embedding-005, it uses Matryoshka Representation Learning to support multiple output dimension sizes. Smaller dimensions reduce vector storage and compute costs with a minor quality tradeoff.

  • When should I use this model versus text-embedding-005?

    Use text-multilingual-embedding-002 whenever your application must handle content or queries in multiple languages. Use text-embedding-005 for strictly English-language applications where maximum MTEB benchmark performance is the priority.

  • What is the pricing for this model?

    Check the pricing panel on this page for today's numbers. AI Gateway tracks rates across every provider that serves Text Multilingual Embedding 002.

  • Does this model work for cross-lingual classification tasks?

    Yes. The shared vector space means that classifiers trained on labeled data in one language can classify documents in other supported languages, which is useful for content moderation, sentiment analysis, and topic categorization across multilingual corpora.

  • Do I need to detect the language of input text before embedding it?

    No. The model handles all 18 supported languages from a single endpoint. Language detection and routing are not required: submit text in any supported language and the model produces an embedding in the shared multilingual vector space.