Skip to content

text-embedding-3-small

text-embedding-3-small delivers higher MTEB scores than ada-002 at lower cost, with a 1536-dimension default that drops into existing pipelines and a flexible dimensions parameter for further storage savings.

index.ts
import { embed } from 'ai';
const result = await embed({
model: 'openai/text-embedding-3-small',
value: 'Sunny day at the beach',
})

About text-embedding-3-small

OpenAI announced text-embedding-3-small on January 25, 2024 alongside its larger sibling. At reduced per-token cost compared to ada-002, it scores 62.3% on MTEB (Massive Text Embedding Benchmark), 1.3 points higher than the model it replaces.

Multilingual retrieval improves substantially too. text-embedding-3-small reaches 44.0% on MIRACL versus ada-002's 31.4%, a 12.6-point gain that matters for any pipeline handling non-English or mixed-language content.

Like text-embedding-3-large, text-embedding-3-small supports the dimensions parameter via Matryoshka training. The default 1,536-dimension output matches ada-002 for drop-in compatibility, but you can reduce it when memory or storage costs are a concern. Semantic structure is front-loaded into the earlier dimensions, so shorter vectors still carry meaningful signal.

text-embedding-3-small fits well in the query-time embedding path of Retrieval-Augmented Generation (RAG) architectures. Every user query must be embedded before retrieval, and at scale that per-query cost and latency compounds. Low cost and fast inference make it a natural fit for that leg of the pipeline.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Azure
Legal:Terms
Privacy
$0.02/M
01/25/2024
OpenAI
Legal:Terms
Privacy
$0.02/M
01/25/2024
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

More models by OpenAI

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
0.6s
136tps
$5.00/M
$30.00/M
Read:
$0.5/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
04/24/2026
400K
2.6s
231tps
$0.75/M$4.50/M
Read:$0.07/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/17/2026
400K
0.6s
77tps
$0.20/M$1.25/M
Read:$0.02/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/17/2026
1.1M
0.5s
59tps
$2.50/M
$15.00/M
Read:
$0.25/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
03/05/2026
128K
0.7s
89tps
$1.25/M$10.00/M
Read:$0.13/M
Write:
$10.00/K
+ input costs
azure logo
openai logo
11/12/2025
131K
0.1s
1785tps
$0.35/M$0.75/M
Read:$0.25/M
Write:
baseten logo
bedrock logo
cerebras logo
+5
08/05/2025

What To Consider When Choosing a Provider

  • Configuration: Because the default output is 1536 dimensions, identical to ada-002, you can swap models without touching your vector database schema or index configuration.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use text-embedding-3-small

Best For

  • Ada-002 drop-in replacement: No schema changes and immediate cost savings with the same 1536-dimension default
  • Real-time RAG pipelines: Query-time embedding where latency and cost per request matter
  • High-volume indexing: Document pipelines where the reduced cost yields significant infrastructure savings
  • Multilingual retrieval: Applications that benefit from the 12.6-point MIRACL improvement over ada-002
  • Budget-conscious projects: Embedding quality is important but not the absolute ceiling

Consider Alternatives When

  • Retrieval accuracy bottleneck: The 2.3-point MTEB gap versus text-embedding-3-large is material for your use case
  • Heavily multilingual corpus: The larger model's 54.9% MIRACL score versus 44.0% would produce meaningfully better results
  • Maximum dimensionality: Specialized downstream models need the full 3072-dimension vectors from text-embedding-3-large

Conclusion

text-embedding-3-small is the practical default embedding model for most applications on AI Gateway. It costs less than ada-002, performs better, and drops in without migration pain. Start here unless you have a specific reason to pay for the large variant's extra accuracy.

Frequently Asked Questions

  • Is text-embedding-3-small a direct replacement for ada-002?

    Yes. The default output is 1536 dimensions, same as ada-002, so existing vector indexes work without rebuilding. You get a higher MTEB score and immediate cost savings.

  • How much does the multilingual retrieval improve over ada-002?

    MIRACL scores go from 31.4% to 44.0%. For pipelines that handle queries or documents in multiple languages, this is a meaningful quality improvement that comes free with the model swap.

  • When does it make sense to pay for text-embedding-3-large instead?

    When your application's quality is bottlenecked by embedding accuracy, for example, legal search, scientific literature retrieval, or high-stakes recommendation systems where a 2-point MTEB difference translates to noticeably better results.

  • Can I reduce the vector dimensions below 1536?

    Yes. The dimensions parameter accepts any value below the default. Matryoshka training ensures the truncated vectors retain useful semantic structure, which is helpful for reducing storage costs in large indexes.

  • What are typical latency characteristics?

    This page shows live throughput and time-to-first-token metrics measured across real AI Gateway embedding traffic.