How many languages does text-multilingual-embedding-002 support?

The model is evaluated on MIRACL, which covers 18 languages. Text Multilingual Embedding 002 scores 56.2% on average on this benchmark. Consult the Vertex AI documentation for the complete list of supported languages.

What is MIRACL and how does it differ from MTEB?

MIRACL (Massive Information Retrieval Across Languages) is a multilingual retrieval benchmark covering 18 languages, used to evaluate cross-lingual information retrieval quality. MTEB is an English-language benchmark covering eight task categories. The two models in this family are each evaluated on the benchmark most relevant to their design target.

Can users query in one language and retrieve results in another?

Yes. This is the key capability of a shared multilingual embedding space. Text from all supported languages is mapped into the same vector space, so a query in Japanese and a matching document in Arabic will have similar vector representations, enabling cross-lingual retrieval without query translation.

Does this model support dynamic embedding sizes?

Yes. Like text-embedding-005, it uses Matryoshka Representation Learning to support multiple output dimension sizes. Smaller dimensions reduce vector storage and compute costs with a minor quality tradeoff.

When should I use this model versus text-embedding-005?

Use text-multilingual-embedding-002 whenever your application must handle content or queries in multiple languages. Use text-embedding-005 for strictly English-language applications where maximum MTEB benchmark performance is the priority.

What is the pricing for this model?

Check the pricing panel on this page for today's numbers. AI Gateway tracks rates across every provider that serves Text Multilingual Embedding 002.

Does this model work for cross-lingual classification tasks?

Yes. The shared vector space means that classifiers trained on labeled data in one language can classify documents in other supported languages, which is useful for content moderation, sentiment analysis, and topic categorization across multilingual corpora.

Do I need to detect the language of input text before embedding it?

No. The model handles all 18 supported languages from a single endpoint. Language detection and routing are not required: submit text in any supported language and the model produces an embedding in the shared multilingual vector space.

Dashboard

Text Multilingual Embedding 002

Text Multilingual Embedding 002 is an 18-language text embedding model achieving a 56.2% average score on the Massive Information Retrieval Across Languages (MIRACL) benchmark, designed for cross-lingual semantic search and retrieval across diverse language corpora.

index.ts

import { embed } from 'ai';

const result = await embed({
  model: 'google/text-multilingual-embedding-002',
  value: 'Sunny day at the beach',
})

Overview About Providers Similar FAQ

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

$0.03/M

—

03/01/2024

More models by Google

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

3.1s

256tps

$1.50/M

$9.00/M

Read:$0.15/M

Write:—

$14.00/K

+ input costs

—

05/19/2026

262K

0.4s

72tps

$0.13/M

$0.40/M

Read:$0.01/M

Write:—

—

04/02/2026

0.7s

264tps

$0.25/M

$1.50/M

Read:$0.03/M

Write:—

$14.00/K

+ input costs

—

03/03/2026

5.2s

306tps

$2.00/M

$12.00/M

Read:

$0.2/M

Write:

—

$14.00/K

+ input costs

—

02/19/2026

0.7s

168tps

$0.50/M

$3.00/M

Read:

$0.05/M

Write:

—

$14.00/K

+ input costs

—

12/17/2025

0.3s

256tps

$0.10/M

$0.40/M

Read:$0.01/M

Write:—

$35.00/K

+ input costs

—

06/17/2025

About Text Multilingual Embedding 002

Text-multilingual-embedding-002 is Google's embedding model purpose-built for multilingual natural language processing (NLP) applications. Released alongside text-embedding-005 at Google Cloud Next '24, it uses the same Gecko architecture but targets cross-lingual coverage rather than maximum English-language benchmark performance. Its primary evaluation benchmark is MIRACL (Massive Information Retrieval Across Languages), covering 18 languages, where it achieves a 56.2% average score.

The practical value lies in vector space alignment across languages. Rather than running separate monolingual models for each language in your corpus, text-multilingual-embedding-002 embeds content from all 18 supported languages into a shared semantic space. A query submitted in one language can surface relevant documents written in any other supported language, without a translation step. For global products, international content platforms, or multilingual knowledge bases, this shared embedding space eliminates the complexity of language detection and routing.

Like its English-only sibling, text-multilingual-embedding-002 supports dynamic embedding sizes through Matryoshka Representation Learning (MRL). You can choose smaller dimension outputs to reduce vector storage and compute costs, with a minor quality tradeoff. This flexibility matters for multilingual applications where the corpus may be significantly larger than a monolingual equivalent.

What To Consider When Choosing a Provider

Configuration: For multilingual retrieval applications, this model maps text from all supported languages into the same vector space. That enables cross-lingual queries: for example, a user querying in Japanese can retrieve documents written in Spanish without a query translation layer.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Text Multilingual Embedding 002

Best For

Multilingual semantic search: Applications serving users who query in different languages than the indexed content
Cross-lingual document retrieval: Knowledge base search across international content corpora
Global customer support: Systems where user questions and knowledge base articles span multiple languages
Multilingual clustering and classification: Tasks that need consistent semantic representations across languages
International content platforms: E-commerce or media indexing product descriptions or articles in multiple languages

Consider Alternatives When

English-only corpus: Your corpus and users are exclusively English-language (consider google/text-embedding-005 for higher MTEB scores)
Unsupported language needed: You require a language not covered by the 18-language MIRACL benchmark, verify support in the Vertex AI documentation
Peak English retrieval quality: Multilingual support is not required and maximum English performance is the primary criterion

Conclusion

Text-multilingual-embedding-002 solves the core infrastructure challenge of multilingual retrieval: maintaining a single vector index that serves queries and documents across 18 languages without translation layers or per-language model management. For global applications where your user base and content corpus span multiple languages, it provides the embedding foundation that makes cross-lingual semantic search tractable.

Frequently Asked Questions

How many languages does text-multilingual-embedding-002 support?
The model is evaluated on MIRACL, which covers 18 languages. Text Multilingual Embedding 002 scores 56.2% on average on this benchmark. Consult the Vertex AI documentation for the complete list of supported languages.
What is MIRACL and how does it differ from MTEB?
MIRACL (Massive Information Retrieval Across Languages) is a multilingual retrieval benchmark covering 18 languages, used to evaluate cross-lingual information retrieval quality. MTEB is an English-language benchmark covering eight task categories. The two models in this family are each evaluated on the benchmark most relevant to their design target.
Can users query in one language and retrieve results in another?
Yes. This is the key capability of a shared multilingual embedding space. Text from all supported languages is mapped into the same vector space, so a query in Japanese and a matching document in Arabic will have similar vector representations, enabling cross-lingual retrieval without query translation.
Does this model support dynamic embedding sizes?
Yes. Like text-embedding-005, it uses Matryoshka Representation Learning to support multiple output dimension sizes. Smaller dimensions reduce vector storage and compute costs with a minor quality tradeoff.
When should I use this model versus text-embedding-005?
Use text-multilingual-embedding-002 whenever your application must handle content or queries in multiple languages. Use text-embedding-005 for strictly English-language applications where maximum MTEB benchmark performance is the priority.
What is the pricing for this model?
Check the pricing panel on this page for today's numbers. AI Gateway tracks rates across every provider that serves Text Multilingual Embedding 002.
Does this model work for cross-lingual classification tasks?
Yes. The shared vector space means that classifiers trained on labeled data in one language can classify documents in other supported languages, which is useful for content moderation, sentiment analysis, and topic categorization across multilingual corpora.
Do I need to detect the language of input text before embedding it?
No. The model handles all 18 supported languages from a single endpoint. Language detection and routing are not required: submit text in any supported language and the model produces an embedding in the shared multilingual vector space.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Text Multilingual Embedding 002

Providers

More models by Google

About Text Multilingual Embedding 002

What To Consider When Choosing a Provider

When to Use Text Multilingual Embedding 002

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions