Question 1

How does Matryoshka dimension reduction work in practice?

Accepted Answer

The model encodes the most semantically important information into the first dimensions of each vector. When you request fewer dimensions via the `dimensions` parameter, you get a truncated vector that retains strong semantic structure. A 256-dimension vector from this model outperforms a full 1536-dimension ada-002 embedding on MTEB.

Question 2

What is the MIRACL benchmark and why does the score matter?

Accepted Answer

MIRACL evaluates retrieval accuracy across multiple languages. text-embedding-3-large scores 54.9% versus ada-002's 31.4%, a 23.5-point gap that translates to substantially better search results when queries and documents are in different languages.

Question 3

Can I embed at full 3072 dimensions and query at a lower dimension?

Accepted Answer

Yes, but the query and document dimensions must match at search time. The recommended approach is to embed your corpus at 3072 for archival accuracy, then re-embed queries at a test dimension to evaluate recall before committing to a reduced index.

Question 4

How many dimensions should I use for my application?

Accepted Answer

It depends on your recall requirements and infrastructure constraints. Start at 3072 and measure recall. If it exceeds your threshold at 1024 or 512, use the smaller size to save storage and speed up lookups. There is no universal right answer; the tradeoff is application-specific.

Question 5

Does text-embedding-3-large support batch requests?

Accepted Answer

Yes. Multiple texts can be embedded in a single API call. For indexing pipelines processing millions of documents, batching is the standard approach to maximize throughput.

Question 6

What are typical latency characteristics?

Accepted Answer

This page shows live throughput and time-to-first-token metrics measured across real AI Gateway embedding traffic.

Agent Stack

Core Platform

Tools

Learn

Build

Explore

text-embedding-3-large

Frequently Asked Questions