How does Matryoshka dimension reduction work in practice?

The model encodes the most semantically important information into the first dimensions of each vector. When you request fewer dimensions via the `dimensions` parameter, you get a truncated vector that retains strong semantic structure. A 256-dimension vector from this model outperforms a full 1536-dimension ada-002 embedding on MTEB.

What is the MIRACL benchmark and why does the score matter?

MIRACL evaluates retrieval accuracy across multiple languages. text-embedding-3-large scores 54.9% versus ada-002's 31.4%, a 23.5-point gap that translates to substantially better search results when queries and documents are in different languages.

Can I embed at full 3072 dimensions and query at a lower dimension?

Yes, but the query and document dimensions must match at search time. The recommended approach is to embed your corpus at 3072 for archival accuracy, then re-embed queries at a test dimension to evaluate recall before committing to a reduced index.

How many dimensions should I use for my application?

It depends on your recall requirements and infrastructure constraints. Start at 3072 and measure recall. If it exceeds your threshold at 1024 or 512, use the smaller size to save storage and speed up lookups. There is no universal right answer; the tradeoff is application-specific.

Does text-embedding-3-large support batch requests?

Yes. Multiple texts can be embedded in a single API call. For indexing pipelines processing millions of documents, batching is the standard approach to maximize throughput.

What are typical latency characteristics?

This page shows live throughput and time-to-first-token metrics measured across real AI Gateway embedding traffic.

Dashboard

text-embedding-3-large

text-embedding-3-large produces 3072-dimensional vectors with the highest MTEB and MIRACL scores in the text-embedding-3 family, with built-in Matryoshka dimension reduction for flexible quality-storage tradeoffs in production retrieval systems.

index.ts

import { embed } from 'ai';

const result = await embed({
  model: 'openai/text-embedding-3-large',
  value: 'Sunny day at the beach',
})

Overview About Providers Similar FAQ

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

$0.13/M

—

01/25/2024

Legal:Terms

•

Privacy

$0.13/M

—

01/25/2024

More models by OpenAI

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

0.9s

64tps

$5.00/M

$30.00/M

Read:

$0.5/M

Write:

—

$10.00/K

+ input costs

—

04/24/2026

400K

1.3s

71tps

$0.75/M

$4.50/M

Read:$0.07/M

Write:—

$10.00/K

+ input costs

—

03/17/2026

400K

0.6s

42tps

$0.20/M

$1.25/M

Read:$0.02/M

Write:—

$10.00/K

+ input costs

—

03/17/2026

1.1M

0.9s

62tps

$2.50/M

$15.00/M

Read:

$0.25/M

Write:

—

$10.00/K

+ input costs

—

03/05/2026

128K

0.6s

99tps

$1.25/M

$10.00/M

Read:$0.13/M

Write:—

$10.00/K

+ input costs

—

11/12/2025

131K

0.2s

1781tps

$0.35/M

$0.75/M

Read:$0.25/M

Write:—

—

08/05/2025

About text-embedding-3-large

OpenAI released text-embedding-3-large on January 25, 2024 as the accuracy-maximizing option in the third-generation embedding family.

The MTEB (Massive Text Embedding Benchmark) score tells the broadest story. At 64.6%, text-embedding-3-large spans retrieval, classification, clustering, and semantic similarity, outperforming its predecessor ada-002 by 3.6 points. But the multilingual gap deserves close attention. On MIRACL, the standard cross-language retrieval benchmark, the score jumps from ada-002's 31.4% to 54.9%. That 23.5-point improvement is not incremental. It's the difference between a multilingual search system that frustrates users and one that works.

The model uses Matryoshka Representation Learning, a technique that front-loads the most important semantic information into the earliest vector dimensions. The practical consequence: you can request 256 dimensions and still outperform a full 1,536-dimension ada-002 embedding. This turns vector storage and memory from fixed infrastructure costs into tunable parameters. Teams managing indexes with hundreds of millions of documents gain a lever that directly affects their infrastructure bill.

At native 3,072 dimensions, the vectors capture the finest semantic distinctions the model can represent. Reducing dimensions trades some granularity for smaller index sizes, faster nearest-neighbor lookups, and lower memory consumption. The right setting depends on your corpus and application. A legal document search engine and a product recommendation system have very different tolerances for recall degradation.

What To Consider When Choosing a Provider

Configuration: A practical workflow: embed your corpus at the full 3072 dimensions for archival quality. Then use the dimensions parameter at query time to benchmark whether 256, 512, or 1024 dimensions produce acceptable recall for your dataset. This lets you tune the accuracy-storage curve without re-indexing.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use text-embedding-3-large

Best For

RAG and semantic search: Pipelines where the quality of retrieved passages directly determines output quality
Multilingual retrieval: Cross-lingual search that benefits from the 23.5-point MIRACL gain over ada-002
Large-scale vector databases: Indexes that benefit from tunable dimensions to balance precision against storage cost
Recommendation systems: Similarity scoring that demands higher embedding fidelity than text-embedding-3-small
Ada-002 migration: Teams that want the maximum quality step-up in a single change

Consider Alternatives When

Tight cost constraint: The smaller variant runs at roughly 6.5x lower cost per token
Short, simple texts: The quality gap between large and small models becomes negligible on simple content
Latency-critical queries: A lighter model fits your SLA better when query-time latency is the bottleneck

Conclusion

text-embedding-3-large delivers the highest embedding quality in the text-embedding-3 family with the flexibility to shrink vectors when full fidelity isn't required. For retrieval-critical applications on AI Gateway, particularly those spanning multiple languages, it provides a meaningful accuracy step up over text-embedding-3-small.

Frequently Asked Questions

How does Matryoshka dimension reduction work in practice?
The model encodes the most semantically important information into the first dimensions of each vector. When you request fewer dimensions via the dimensions parameter, you get a truncated vector that retains strong semantic structure. A 256-dimension vector from this model outperforms a full 1536-dimension ada-002 embedding on MTEB.
What is the MIRACL benchmark and why does the score matter?
MIRACL evaluates retrieval accuracy across multiple languages. text-embedding-3-large scores 54.9% versus ada-002's 31.4%, a 23.5-point gap that translates to substantially better search results when queries and documents are in different languages.
Can I embed at full 3072 dimensions and query at a lower dimension?
Yes, but the query and document dimensions must match at search time. The recommended approach is to embed your corpus at 3072 for archival accuracy, then re-embed queries at a test dimension to evaluate recall before committing to a reduced index.
How many dimensions should I use for my application?
It depends on your recall requirements and infrastructure constraints. Start at 3072 and measure recall. If it exceeds your threshold at 1024 or 512, use the smaller size to save storage and speed up lookups. There is no universal right answer; the tradeoff is application-specific.
Does text-embedding-3-large support batch requests?
Yes. Multiple texts can be embedded in a single API call. For indexing pipelines processing millions of documents, batching is the standard approach to maximize throughput.
What are typical latency characteristics?
This page shows live throughput and time-to-first-token metrics measured across real AI Gateway embedding traffic.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

text-embedding-3-large

Providers

More models by OpenAI

About text-embedding-3-large

What To Consider When Choosing a Provider

When to Use text-embedding-3-large

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions