What is the MTEB multilingual leaderboard score for Qwen3 Embedding 8B?

As of June 5, 2025, the model scored 70.58 on the MTEB multilingual leaderboard, placing it first among publicly evaluated embedding models at that date.

How large are the vectors this model produces?

The default output is 4096-dimensional. Using Matryoshka Representation Learning (MRL), you can truncate these vectors to shorter prefix lengths for use cases where storage or query latency is constrained.

What distinguishes the 8B from the 4B embedding model in practice?

Both the 8B and 4B models use 36 transformer layers, but the 8B model has wider layers with more parameters per layer. It produces 4096-dimensional vectors compared to 2560 for the 4B. This additional resolution typically improves performance on dense retrieval and clustering tasks, particularly for technical and multilingual corpora.

Does the context window of 32.8K tokens apply per document being embedded?

Yes. Each individual text input can be up to 32.8K tokens. If a document exceeds this limit it must be split into chunks before embedding.

How does instruction-based customization work for this model?

You can prepend a task-specific instruction to your query (e.g., describing the retrieval goal) to shift the embedding space toward that intent. This is particularly effective for asymmetric retrieval where query phrasing differs significantly from document phrasing.

Is this model suitable for embedding source code alongside prose?

Yes. The Qwen3 Embedding training explicitly covers multiple programming languages, so a unified vector index mixing code files and documentation is a supported pattern.

Qwen3 Embedding 8B

Qwen3 Embedding 8B is Alibaba's 8B-tier text embedding model in the Qwen3 Embedding line, producing 4096-dimensional vectors and ranking first on the MTEB multilingual leaderboard at release, built for demanding cross-lingual retrieval and RAG workloads.

import { embed } from 'ai';

const result = await embed({
  model: 'alibaba/qwen3-embedding-8b',
  value: 'Sunny day at the beach',
})

Overview About Providers Throughput Latency Similar FAQ

Frequently Asked Questions

What is the MTEB multilingual leaderboard score for Qwen3 Embedding 8B?
As of June 5, 2025, the model scored 70.58 on the MTEB multilingual leaderboard, placing it first among publicly evaluated embedding models at that date.
How large are the vectors this model produces?
The default output is 4096-dimensional. Using Matryoshka Representation Learning (MRL), you can truncate these vectors to shorter prefix lengths for use cases where storage or query latency is constrained.
What distinguishes the 8B from the 4B embedding model in practice?
Both the 8B and 4B models use 36 transformer layers, but the 8B model has wider layers with more parameters per layer. It produces 4096-dimensional vectors compared to 2560 for the 4B. This additional resolution typically improves performance on dense retrieval and clustering tasks, particularly for technical and multilingual corpora.
Does the context window of 32.8K tokens apply per document being embedded?
Yes. Each individual text input can be up to 32.8K tokens. If a document exceeds this limit it must be split into chunks before embedding.
How does instruction-based customization work for this model?
You can prepend a task-specific instruction to your query (e.g., describing the retrieval goal) to shift the embedding space toward that intent. This is particularly effective for asymmetric retrieval where query phrasing differs significantly from document phrasing.
Is this model suitable for embedding source code alongside prose?
Yes. The Qwen3 Embedding training explicitly covers multiple programming languages, so a unified vector index mixing code files and documentation is a supported pattern.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Qwen3 Embedding 8B

Frequently Asked Questions