Question 1

How many languages does text-multilingual-embedding-002 support?

Accepted Answer

The model is evaluated on MIRACL, which covers 18 languages. Text Multilingual Embedding 002 scores 56.2% on average on this benchmark. Consult the Vertex AI documentation for the complete list of supported languages.

Question 2

What is MIRACL and how does it differ from MTEB?

Accepted Answer

MIRACL (Massive Information Retrieval Across Languages) is a multilingual retrieval benchmark covering 18 languages, used to evaluate cross-lingual information retrieval quality. MTEB is an English-language benchmark covering eight task categories. The two models in this family are each evaluated on the benchmark most relevant to their design target.

Question 3

Can users query in one language and retrieve results in another?

Accepted Answer

Yes. This is the key capability of a shared multilingual embedding space. Text from all supported languages is mapped into the same vector space, so a query in Japanese and a matching document in Arabic will have similar vector representations, enabling cross-lingual retrieval without query translation.

Question 4

Does this model support dynamic embedding sizes?

Accepted Answer

Yes. Like text-embedding-005, it uses Matryoshka Representation Learning to support multiple output dimension sizes. Smaller dimensions reduce vector storage and compute costs with a minor quality tradeoff.

Question 5

When should I use this model versus text-embedding-005?

Accepted Answer

Use text-multilingual-embedding-002 whenever your application must handle content or queries in multiple languages. Use text-embedding-005 for strictly English-language applications where maximum MTEB benchmark performance is the priority.

Question 6

What is the pricing for this model?

Accepted Answer

Check the pricing panel on this page for today's numbers. AI Gateway tracks rates across every provider that serves Text Multilingual Embedding 002.

Question 7

Does this model work for cross-lingual classification tasks?

Accepted Answer

Yes. The shared vector space means that classifiers trained on labeled data in one language can classify documents in other supported languages, which is useful for content moderation, sentiment analysis, and topic categorization across multilingual corpora.

Question 8

Do I need to detect the language of input text before embedding it?

Accepted Answer

No. The model handles all 18 supported languages from a single endpoint. Language detection and routing are not required: submit text in any supported language and the model produces an embedding in the shared multilingual vector space.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Text Multilingual Embedding 002

Frequently Asked Questions