Cohere Rerank 4 Pro

Cohere Rerank 4 Pro is a multilingual reranking model from Cohere built for state-of-the-art relevance on complex queries over English and non-English documents and semi-structured JSON.

Rerank

index.ts

import { rerank } from 'ai';

const result = await rerank({
  model: 'cohere/rerank-v4-pro',
  query: 'What is the capital of France?',
  documents: [
    'Paris is the capital of France.',
    'Berlin is the capital of Germany.',
    'Madrid is the capital of Spain.',
  ],
})

Overview About Providers Similar FAQ

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Input	ZDR	No Training	Release Date

Cohere

Legal:Terms

•

Privacy

32K

$2.50/K

12/11/2025

More models by Cohere

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Capabilities	Providers	ZDR	No Training	Release Date

cohere/rerank-v4-fast

32K

$2/K

—

12/11/2025

cohere/embed-v4.0

128K

$0.12/M

—

04/15/2025

cohere/command-a

256K

0.2s

73tps

$2.50/M

$10/M

—

03/13/2025

cohere/rerank-v3.5

$2/K

—

12/02/2024

About Cohere Rerank 4 Pro

Cohere Rerank 4 Pro is the quality tier of the Rerank 4 generation, released December 11, 2025 alongside rerank-v4-fast. Cohere positions it as the strongest Cohere reranker yet, aimed at enterprise search and RAG pipelines where ranking accuracy on complex queries directly drives downstream outcomes.

Reranking is a cross-encoder step. Cohere Rerank 4 Pro reads the query and each candidate document together with full attention, scoring relevance in a way that bi-encoder embedding similarity cannot. Multi-part queries, queries with conditions, and queries that hinge on a single phrase inside a long document benefit most from this setup.

Multilingual coverage spans more than 100 languages. A query in one language can match documents in another within the same rerank call, so a single index can serve a global user base without separate per-language pipelines. Document types include long-form text, tables, code, and semi-structured JSON.

In a RAG pipeline, the common pattern is to retrieve 50 to 200 candidates with an embedding model, then rerank them with Cohere Rerank 4 Pro down to a top-k of 5 to 20 documents handed to the generative model. The reranker reduces noise in the LLM context, which often improves answer quality more than swapping in a larger generative model would.

See https://cohere.com/blog/rerank-4 for the API contract. Reranking is billed per search query, so cost scales with traffic rather than document length or token count.

What To Consider When Choosing a Provider

Configuration: Rerankers refine an existing candidate set; they don't replace retrieval. Pair Cohere Rerank 4 Pro with an embedding model, BM25, or hybrid retriever for the first pass. The per-document context of 32K tokens covers query and document tokens together, so very long documents may need chunking before they reach the model.
Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Cohere Rerank 4 Pro

Best for

High-quality enterprise search: Top-k selection drives downstream outcomes in legal, financial, and support contexts
Complex query reasoning: Multi-part and conditional queries benefit from full cross-attention
Global multilingual RAG: One reranker covers more than 100 languages including cross-lingual matching
Long-document corpora: Chunked passages get precise final ordering before they hit the LLM context
Hybrid retrieval merging: BM25 and vector candidates share a single relevance score

Consider alternatives when

Latency-critical pipelines: rerank-v4-fast is tuned for lower response time and higher throughput
English-only corpora: rerank-v3.5 covers English RAG at a lower per-query price
First-pass alone is enough: The accuracy bar is met without a second stage
Multimodal retrieval: Pick a model with native image inputs

Conclusion

Cohere Rerank 4 Pro is the right reranker when ranking accuracy on complex multilingual queries is what the application is judged on. Route it through AI Gateway with model id cohere/rerank-v4-pro to share billing and observability with the embedding and generative models in the same RAG pipeline.

Agent Stack

Core Platform

Tools

Learn

Build

Explore