Cohere Rerank 4 Pro
Cohere Rerank 4 Pro is a multilingual reranking model from Cohere built for state-of-the-art relevance on complex queries over English and non-English documents and semi-structured JSON.
import { rerank } from 'ai';
const result = await rerank({ model: 'cohere/rerank-v4-pro', query: 'What is the capital of France?', documents: [ 'Paris is the capital of France.', 'Berlin is the capital of Germany.', 'Madrid is the capital of Spain.', ],})Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
More models by Cohere
| Model |
|---|
About Cohere Rerank 4 Pro
Cohere Rerank 4 Pro is the quality tier of the Rerank 4 generation, released December 11, 2025 alongside rerank-v4-fast. Cohere positions it as the strongest Cohere reranker yet, aimed at enterprise search and RAG pipelines where ranking accuracy on complex queries directly drives downstream outcomes.
Reranking is a cross-encoder step. Cohere Rerank 4 Pro reads the query and each candidate document together with full attention, scoring relevance in a way that bi-encoder embedding similarity cannot. Multi-part queries, queries with conditions, and queries that hinge on a single phrase inside a long document benefit most from this setup.
Multilingual coverage spans more than 100 languages. A query in one language can match documents in another within the same rerank call, so a single index can serve a global user base without separate per-language pipelines. Document types include long-form text, tables, code, and semi-structured JSON.
In a RAG pipeline, the common pattern is to retrieve 50 to 200 candidates with an embedding model, then rerank them with Cohere Rerank 4 Pro down to a top-k of 5 to 20 documents handed to the generative model. The reranker reduces noise in the LLM context, which often improves answer quality more than swapping in a larger generative model would.
See https://cohere.com/blog/rerank-4 for the API contract. Reranking is billed per search query, so cost scales with traffic rather than document length or token count.
What To Consider When Choosing a Provider
- Configuration: Rerankers refine an existing candidate set; they don't replace retrieval. Pair Cohere Rerank 4 Pro with an embedding model, BM25, or hybrid retriever for the first pass. The per-document context of 32K tokens covers query and document tokens together, so very long documents may need chunking before they reach the model.
- Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Cohere Rerank 4 Pro
Best For
- High-quality enterprise search: Top-k selection drives downstream outcomes in legal, financial, and support contexts
- Complex query reasoning: Multi-part and conditional queries benefit from full cross-attention
- Global multilingual RAG: One reranker covers more than 100 languages including cross-lingual matching
- Long-document corpora: Chunked passages get precise final ordering before they hit the LLM context
- Hybrid retrieval merging: BM25 and vector candidates share a single relevance score
Consider Alternatives When
- Latency-critical pipelines:
rerank-v4-fastis tuned for lower response time and higher throughput - English-only corpora:
rerank-v3.5covers English RAG at a lower per-query price - First-pass alone is enough: The accuracy bar is met without a second stage
- Multimodal retrieval: Pick a model with native image inputs
Conclusion
Cohere Rerank 4 Pro is the right reranker when ranking accuracy on complex multilingual queries is what the application is judged on. Route it through AI Gateway with model id cohere/rerank-v4-pro to share billing and observability with the embedding and generative models in the same RAG pipeline.