What is the MTEB multilingual leaderboard score for Qwen3 Embedding 8B?

As of June 5, 2025, the model scored 70.58 on the MTEB multilingual leaderboard, placing it first among publicly evaluated embedding models at that date.

How large are the vectors this model produces?

The default output is 4096-dimensional. Using Matryoshka Representation Learning (MRL), you can truncate these vectors to shorter prefix lengths for use cases where storage or query latency is constrained.

What distinguishes the 8B from the 4B embedding model in practice?

Both the 8B and 4B models use 36 transformer layers, but the 8B model has wider layers with more parameters per layer. It produces 4096-dimensional vectors compared to 2560 for the 4B. This additional resolution typically improves performance on dense retrieval and clustering tasks, particularly for technical and multilingual corpora.

Does the context window of 32.8K tokens apply per document being embedded?

Yes. Each individual text input can be up to 32.8K tokens. If a document exceeds this limit it must be split into chunks before embedding.

How does instruction-based customization work for this model?

You can prepend a task-specific instruction to your query (e.g., describing the retrieval goal) to shift the embedding space toward that intent. This is particularly effective for asymmetric retrieval where query phrasing differs significantly from document phrasing.

Is this model suitable for embedding source code alongside prose?

Yes. The Qwen3 Embedding training explicitly covers multiple programming languages, so a unified vector index mixing code files and documentation is a supported pattern.

Dashboard

Qwen3 Embedding 8B

Qwen3 Embedding 8B is Alibaba's 8B-tier text embedding model in the Qwen3 Embedding line, producing 4096-dimensional vectors and ranking first on the MTEB multilingual leaderboard at release, built for demanding cross-lingual retrieval and RAG workloads.

index.ts

import { embed } from 'ai';

const result = await embed({
  model: 'alibaba/qwen3-embedding-8b',
  value: 'Sunny day at the beach',
})

Overview About Providers Similar FAQ

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

33K

$0.05/M

—

06/05/2025

More models by Alibaba

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

0.9s

205tps

$2.50/M

$7.50/M

Read:$0.25/M

Write:$3.13/M

—

05/21/2026

240K

2.0s

72tps

$1.30/M

$7.80/M

Read:

$0.26/M

Write:

$1.63/M

—

04/20/2026

0.3s

57tps

$0.50/M

$3.00/M

Read:

$0.1/M

Write:

$0.63/M

—

04/02/2026

1.5s

254tps

$0.10/M

$0.40/M

Read:$0.0/M

Write:$0.13/M

—

02/24/2026

1.4s

110tps

$0.40/M

$2.40/M

Read:

$0.04/M

Write:

$0.5/M

—

02/16/2026

262K

0.1s

66tps

$0.07/M

$0.46/M

Read:$0.6/M

Write:—

—

04/01/2025

About Qwen3 Embedding 8B

Qwen3 Embedding 8B is purpose-built for retrieval workloads where accuracy is paramount. Qwen3 Embedding 8B ranked first on the MTEB multilingual leaderboard with a score of 70.58 at release. Its 4096-dimensional output space encodes fine-grained semantic distinctions that smaller embedding models flatten.

The architecture employs 36 transformer layers and derives from the Qwen3 foundation model. The resulting embeddings generalize across both in-domain and out-of-domain retrieval scenarios.

Coverage spans more than 100 natural languages and multiple programming languages, enabling truly multilingual vector indexes where documents in French, Japanese, or Python code can be searched using queries in any supported language. Matryoshka Representation Learning lets operators shorten vectors at inference time, helpful for tiered index architectures where a coarse first-pass retrieval uses short vectors and a reranking stage uses full-resolution representations.

What To Consider When Choosing a Provider

Configuration: For workloads indexing sensitive documents, confirm that your chosen provider's data-residency region aligns with your compliance requirements before routing production traffic.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Qwen3 Embedding 8B

Best For

MTEB-driven production retrieval: Systems where MTEB multilingual scores from the model's release evaluations are the primary criterion
Long-document RAG: Pipelines that benefit from context of 32.8K tokens and 4096-dimensional representations to preserve semantic detail
Cross-lingual knowledge bases: Indexes spanning many languages and programming environments
Research and evaluation workloads: MTEB-adjacent benchmarks serve as a proxy for real retrieval performance

Consider Alternatives When

Tight embedding cost budgets: Per-token cost dominates and slightly lower accuracy is acceptable, the 0.6B or 4B variants may provide sufficient quality
Memory-constrained deployments: Environments with strict memory limits make a fully loaded 8B model impractical
Generative output required: This model produces embeddings only; use a generative model when you need text output

Conclusion

Qwen3 Embedding 8B is the right tool when retrieval accuracy across languages and domains can't be compromised. Its first-place standing on the MTEB multilingual leaderboard at release and its 4096-dimensional output make it a strong foundation for enterprise-grade semantic search and RAG systems willing to invest in embedding quality.

Frequently Asked Questions

What is the MTEB multilingual leaderboard score for Qwen3 Embedding 8B?
As of June 5, 2025, the model scored 70.58 on the MTEB multilingual leaderboard, placing it first among publicly evaluated embedding models at that date.
How large are the vectors this model produces?
The default output is 4096-dimensional. Using Matryoshka Representation Learning (MRL), you can truncate these vectors to shorter prefix lengths for use cases where storage or query latency is constrained.
What distinguishes the 8B from the 4B embedding model in practice?
Both the 8B and 4B models use 36 transformer layers, but the 8B model has wider layers with more parameters per layer. It produces 4096-dimensional vectors compared to 2560 for the 4B. This additional resolution typically improves performance on dense retrieval and clustering tasks, particularly for technical and multilingual corpora.
Does the context window of 32.8K tokens apply per document being embedded?
Yes. Each individual text input can be up to 32.8K tokens. If a document exceeds this limit it must be split into chunks before embedding.
How does instruction-based customization work for this model?
You can prepend a task-specific instruction to your query (e.g., describing the retrieval goal) to shift the embedding space toward that intent. This is particularly effective for asymmetric retrieval where query phrasing differs significantly from document phrasing.
Is this model suitable for embedding source code alongside prose?
Yes. The Qwen3 Embedding training explicitly covers multiple programming languages, so a unified vector index mixing code files and documentation is a supported pattern.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Qwen3 Embedding 8B

Providers

More models by Alibaba

About Qwen3 Embedding 8B

What To Consider When Choosing a Provider

When to Use Qwen3 Embedding 8B

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions