Skip to content
Dashboard

Cohere Rerank 4 Pro

Cohere Rerank 4 Pro is a multilingual reranking model from Cohere built for state-of-the-art relevance on complex queries over English and non-English documents and semi-structured JSON.

Rerank
index.ts
import { rerank } from 'ai';
const result = await rerank({
model: 'cohere/rerank-v4-pro',
query: 'What is the capital of France?',
documents: [
'Paris is the capital of France.',
'Berlin is the capital of Germany.',
'Madrid is the capital of Spain.',
],
})

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Cohere
32K
—$2.5/K
12/11/2025

More models by Cohere

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
32K
—$2/K
cohere logo
12/11/2025
128K
$0.12/M——
bedrock logo
cohere logo
04/15/2025
256K
0.3s
69tps
$2.50/M$10.00/M——
cohere logo
03/13/2025
4K
—$2/K
bedrock logo
12/02/2024

About Cohere Rerank 4 Pro

Cohere Rerank 4 Pro is the quality tier of the Rerank 4 generation, released December 11, 2025 alongside rerank-v4-fast. Cohere positions it as the strongest Cohere reranker yet, aimed at enterprise search and RAG pipelines where ranking accuracy on complex queries directly drives downstream outcomes.

Reranking is a cross-encoder step. Cohere Rerank 4 Pro reads the query and each candidate document together with full attention, scoring relevance in a way that bi-encoder embedding similarity cannot. Multi-part queries, queries with conditions, and queries that hinge on a single phrase inside a long document benefit most from this setup.

Multilingual coverage spans more than 100 languages. A query in one language can match documents in another within the same rerank call, so a single index can serve a global user base without separate per-language pipelines. Document types include long-form text, tables, code, and semi-structured JSON.

In a RAG pipeline, the common pattern is to retrieve 50 to 200 candidates with an embedding model, then rerank them with Cohere Rerank 4 Pro down to a top-k of 5 to 20 documents handed to the generative model. The reranker reduces noise in the LLM context, which often improves answer quality more than swapping in a larger generative model would.

See https://cohere.com/blog/rerank-4 for the API contract. Reranking is billed per search query, so cost scales with traffic rather than document length or token count.

What To Consider When Choosing a Provider

  • Configuration: Rerankers refine an existing candidate set; they don't replace retrieval. Pair Cohere Rerank 4 Pro with an embedding model, BM25, or hybrid retriever for the first pass. The per-document context of 32K tokens covers query and document tokens together, so very long documents may need chunking before they reach the model.
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Cohere Rerank 4 Pro

Best For

  • High-quality enterprise search: Top-k selection drives downstream outcomes in legal, financial, and support contexts
  • Complex query reasoning: Multi-part and conditional queries benefit from full cross-attention
  • Global multilingual RAG: One reranker covers more than 100 languages including cross-lingual matching
  • Long-document corpora: Chunked passages get precise final ordering before they hit the LLM context
  • Hybrid retrieval merging: BM25 and vector candidates share a single relevance score

Consider Alternatives When

  • Latency-critical pipelines: rerank-v4-fast is tuned for lower response time and higher throughput
  • English-only corpora: rerank-v3.5 covers English RAG at a lower per-query price
  • First-pass alone is enough: The accuracy bar is met without a second stage
  • Multimodal retrieval: Pick a model with native image inputs

Conclusion

Cohere Rerank 4 Pro is the right reranker when ranking accuracy on complex multilingual queries is what the application is judged on. Route it through AI Gateway with model id cohere/rerank-v4-pro to share billing and observability with the embedding and generative models in the same RAG pipeline.