Text Multilingual Embedding 002

Text Multilingual Embedding 002 is an 18-language text embedding model achieving a 56.2% average score on the Massive Information Retrieval Across Languages (MIRACL) benchmark, designed for cross-lingual semantic search and retrieval across diverse language corpora.

index.ts

import { embed } from 'ai';

const result = await embed({
  model: 'google/text-multilingual-embedding-002',
  value: 'Sunny day at the beach',
})

Overview About Providers Similar FAQ

About Text Multilingual Embedding 002

Text-multilingual-embedding-002 is Google's embedding model purpose-built for multilingual natural language processing (NLP) applications. Released alongside text-embedding-005 at Google Cloud Next '24, it uses the same Gecko architecture but targets cross-lingual coverage rather than maximum English-language benchmark performance. Its primary evaluation benchmark is MIRACL (Massive Information Retrieval Across Languages), covering 18 languages, where it achieves a 56.2% average score.

The practical value lies in vector space alignment across languages. Rather than running separate monolingual models for each language in your corpus, text-multilingual-embedding-002 embeds content from all 18 supported languages into a shared semantic space. A query submitted in one language can surface relevant documents written in any other supported language, without a translation step. For global products, international content platforms, or multilingual knowledge bases, this shared embedding space eliminates the complexity of language detection and routing.

Like its English-only sibling, text-multilingual-embedding-002 supports dynamic embedding sizes through Matryoshka Representation Learning (MRL). You can choose smaller dimension outputs to reduce vector storage and compute costs, with a minor quality tradeoff. This flexibility matters for multilingual applications where the corpus may be significantly larger than a monolingual equivalent.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Text Multilingual Embedding 002

About Text Multilingual Embedding 002