Skip to content

text-embedding-3-small

text-embedding-3-small delivers higher MTEB scores than ada-002 at lower cost, with a 1536-dimension default that drops into existing pipelines and a flexible dimensions parameter for further storage savings.

index.ts
import { embed } from 'ai';
const result = await embed({
model: 'openai/text-embedding-3-small',
value: 'Sunny day at the beach',
})

About text-embedding-3-small

OpenAI announced text-embedding-3-small on January 25, 2024 alongside its larger sibling. At reduced per-token cost compared to ada-002, it scores 62.3% on MTEB (Massive Text Embedding Benchmark), 1.3 points higher than the model it replaces.

Multilingual retrieval improves substantially too. text-embedding-3-small reaches 44.0% on MIRACL versus ada-002's 31.4%, a 12.6-point gain that matters for any pipeline handling non-English or mixed-language content.

Like text-embedding-3-large, text-embedding-3-small supports the dimensions parameter via Matryoshka training. The default 1,536-dimension output matches ada-002 for drop-in compatibility, but you can reduce it when memory or storage costs are a concern. Semantic structure is front-loaded into the earlier dimensions, so shorter vectors still carry meaningful signal.

text-embedding-3-small fits well in the query-time embedding path of Retrieval-Augmented Generation (RAG) architectures. Every user query must be embedded before retrieval, and at scale that per-query cost and latency compounds. Low cost and fast inference make it a natural fit for that leg of the pipeline.