Skip to content

Gemini Embedding 2

Gemini Embedding 2 is Google's first natively multimodal embedding model, mapping text, images, video, audio, and documents into a single unified embedding space with support for interleaved multi-modal inputs and over 100 languages.

index.ts
import { embed } from 'ai';
const result = await embed({
model: 'google/gemini-embedding-2',
value: 'Sunny day at the beach',
})

What To Consider When Choosing a Provider

  • Configuration: Because this model embeds multiple modalities into the same vector space, ensure your vector database and retrieval pipeline are configured to handle queries that may originate from a different modality than the indexed documents (for example, text queries against an image corpus).
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Gemini Embedding 2

Best For

  • Multimodal RAG pipelines: Indexing corpora that contain a mix of documents, images, audio, and video, and retrieving across all modalities from a single vector store using unified semantic search
  • Cross-modal retrieval: Enabling text queries to surface relevant images, video clips, or audio segments (and vice versa) by embedding all media into the same shared space
  • Rich document understanding: Embedding PDFs with their visual layout, charts, and text together in a single request rather than extracting and embedding text separately
  • Audio search without transcription: Building search systems over audio archives that skip the intermediate transcription step by directly embedding audio content

Consider Alternatives When

  • Pure text workloads: Your application is text-only and you want maximum input token capacity without paying for multimodal capabilities, where gemini-embedding-001's simpler pricing may be more appropriate
  • No cross-modal retrieval: The complexity of a multimodal embedding space adds operational overhead without benefit
  • Generative output needed: You need generated text rather than vector representations of inputs

Conclusion

Gemini Embedding 2 removes the architectural boundary between modalities in embedding pipelines, replacing parallel per-modality indexes with a single unified space that supports direct cross-modal retrieval and semantic comparison. For teams building the next generation of multimodal search, RAG, and data organization systems, it provides the essential multimodal foundation described.

Frequently Asked Questions

  • What modalities does Gemini Embedding 2 support?

    Text (up to 8,192 tokens), images (up to six per request, PNG and JPEG), video (up to 120 seconds, MP4 and MOV), audio (natively, without intermediate transcription), and documents (PDFs up to six pages).

  • What does it mean that all modalities share a single embedding space?

    Vectors produced from text, images, video, audio, and documents are directly comparable. A text query can retrieve semantically relevant images, or an audio clip can be compared to a PDF. No cross-modal alignment layers on top of separate per-modality models are needed.

  • Can I pass multiple modalities in a single embedding request?

    Yes. The model natively understands interleaved input, so you can pass an image and its text caption together. It captures the relationships between modalities in a single embedding.

  • How does the text context window in Gemini Embedding 2 compare to gemini-embedding-001?

    Gemini Embedding 2 supports up to 8,192 input tokens for text, four times the 2,048-token limit of gemini-embedding-001, making it better suited for embedding longer documents.

  • Does Gemini Embedding 2 use Matryoshka Representation Learning?

    Yes. Like gemini-embedding-001, it uses MRL to allow output dimensions to scale down from the default 3,072. Google recommends 3,072, 1,536, or 768 for highest quality results.

  • What vector database and framework integrations are available?

    Supported integrations include LangChain, LlamaIndex, Haystack, Weaviate, Qdrant, ChromaDB, and Vector Search.

  • What does Gemini Embedding 2 cost on AI Gateway?

    Pricing appears on this page and updates as providers adjust their rates. AI Gateway routes traffic through the configured provider.