Gemini Embedding 2

Gemini Embedding 2 is Google's first natively multimodal embedding model, mapping text, images, video, audio, and documents into a single unified embedding space with support for interleaved multi-modal inputs and over 100 languages.

index.ts

import { embed } from 'ai';

const result = await embed({
  model: 'google/gemini-embedding-2',
  value: 'Sunny day at the beach',
})

Overview About Providers Similar FAQ

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Google

$0.20/M+2 more

—

03/10/2026

More models by Google

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

google/gemini-3.5-flash

1.9s

239tps

$1.50/M

$9.00/M

Read:$0.15/M

Write:—

$14.00/K

+ input costs

—

05/19/2026

google/gemini-3.1-flash-lite-preview

0.4s

227tps

$0.25/M

$1.50/M

Read:$0.03/M

Write:—

$14.00/K

+ input costs

—

03/03/2026

google/gemini-3.1-pro-preview

3.4s

121tps

$2.00/M

$12.00/M

Read:

$0.2/M

Write:

—

$14.00/K

+ input costs

—

02/19/2026

google/gemini-3-flash

0.6s

171tps

$0.50/M

$3.00/M

Read:

$0.05/M

Write:

—

$14.00/K

+ input costs

—

12/17/2025

google/gemini-2.5-flash-lite

0.3s

281tps

$0.10/M

$0.40/M

Read:$0.01/M

Write:—

$35.00/K

+ input costs

—

06/17/2025

google/gemini-2.5-flash

0.3s

202tps

$0.30/M

$2.50/M

Read:$0.03/M

Write:—

$35.00/K

+ input costs

—

03/20/2025

About Gemini Embedding 2

Google released Gemini Embedding 2 on March 10, 2026 in Public Preview as its first fully multimodal embedding model built on the Gemini architecture. It expands on the text-only foundation of its predecessor by mapping text, images, videos, audio, and documents into a single unified embedding space, with semantic understanding across over 100 languages.

Unifying modalities in a single vector space is the core architectural advance. Unlike systems that maintain separate embedding models per modality and then attempt cross-modal alignment after the fact, Gemini Embedding 2 processes all five modalities natively. It produces vectors that are directly comparable regardless of source medium. The model natively understands interleaved input: a request can pass multiple modalities simultaneously (such as an image and its accompanying text caption), and the model captures the relationships between them in a single embedding.

Modality-specific input constraints apply: text up to 8,192 tokens (four times the limit of gemini-embedding-001), up to six images per request in PNG or JPEG format, video up to 120 seconds in MP4 or MOV, audio natively without intermediate transcription, and PDFs up to six pages. The broader text context window is particularly relevant for long-document retrieval pipelines.

Like its predecessor, Gemini Embedding 2 uses Matryoshka Representation Learning (MRL), allowing output dimensions to scale down from the default 3,072. Google recommends 3,072, 1,536, or 768 for highest quality, giving you the same flexibility to balance retrieval accuracy against vector storage costs. The model is available through the Gemini API, Vertex AI, and ecosystem integrations including LangChain, LlamaIndex, Haystack, Weaviate, Qdrant, ChromaDB, and Vector Search.

What To Consider When Choosing a Provider

Configuration: Because this model embeds multiple modalities into the same vector space, ensure your vector database and retrieval pipeline are configured to handle queries that may originate from a different modality than the indexed documents (for example, text queries against an image corpus).
Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Gemini Embedding 2

Best For

Multimodal RAG pipelines: Indexing corpora that contain a mix of documents, images, audio, and video, and retrieving across all modalities from a single vector store using unified semantic search
Cross-modal retrieval: Enabling text queries to surface relevant images, video clips, or audio segments (and vice versa) by embedding all media into the same shared space
Rich document understanding: Embedding PDFs with their visual layout, charts, and text together in a single request rather than extracting and embedding text separately
Audio search without transcription: Building search systems over audio archives that skip the intermediate transcription step by directly embedding audio content

Consider Alternatives When

Pure text workloads: Your application is text-only and you want maximum input token capacity without paying for multimodal capabilities, where gemini-embedding-001's simpler pricing may be more appropriate
No cross-modal retrieval: The complexity of a multimodal embedding space adds operational overhead without benefit
Generative output needed: You need generated text rather than vector representations of inputs

Conclusion

Gemini Embedding 2 removes the architectural boundary between modalities in embedding pipelines, replacing parallel per-modality indexes with a single unified space that supports direct cross-modal retrieval and semantic comparison. For teams building the next generation of multimodal search, RAG, and data organization systems, it provides the essential multimodal foundation described.

Agent Stack

Core Platform

Tools

Learn

Build

Explore

Gemini Embedding 2

Providers

More models by Google

About Gemini Embedding 2

What To Consider When Choosing a Provider

When to Use Gemini Embedding 2

Best For

Consider Alternatives When

Conclusion