Skip to content

Cohere Rerank 4 Pro

Cohere Rerank 4 Pro is a multilingual reranking model from Cohere built for state-of-the-art relevance on complex queries over English and non-English documents and semi-structured JSON.

Rerank
index.ts
import { rerank } from 'ai';
const result = await rerank({
model: 'cohere/rerank-v4-pro',
query: 'What is the capital of France?',
documents: [
'Paris is the capital of France.',
'Berlin is the capital of Germany.',
'Madrid is the capital of Spain.',
],
})

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Cohere
Legal:Terms
Privacy
32K
$2.5/K
12/11/2025

More models by Cohere

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
32K
$2/K
cohere logo
12/11/2025
$0.12/M
cohere logo
04/15/2025
256K
0.2s
75tps
$2.50/M$10.00/M
cohere logo
03/13/2025
4K
$2/K
bedrock logo
12/02/2024

About Cohere Rerank 4 Pro

Cohere Rerank 4 Pro is the quality tier of the Rerank 4 generation, released December 11, 2025 alongside rerank-v4-fast. Cohere positions it as the strongest Cohere reranker yet, aimed at enterprise search and RAG pipelines where ranking accuracy on complex queries directly drives downstream outcomes.

Reranking is a cross-encoder step. Cohere Rerank 4 Pro reads the query and each candidate document together with full attention, scoring relevance in a way that bi-encoder embedding similarity cannot. Multi-part queries, queries with conditions, and queries that hinge on a single phrase inside a long document benefit most from this setup.

Multilingual coverage spans more than 100 languages. A query in one language can match documents in another within the same rerank call, so a single index can serve a global user base without separate per-language pipelines. Document types include long-form text, tables, code, and semi-structured JSON.

In a RAG pipeline, the common pattern is to retrieve 50 to 200 candidates with an embedding model, then rerank them with Cohere Rerank 4 Pro down to a top-k of 5 to 20 documents handed to the generative model. The reranker reduces noise in the LLM context, which often improves answer quality more than swapping in a larger generative model would.

See https://cohere.com/blog/rerank-4 for the API contract. Reranking is billed per search query, so cost scales with traffic rather than document length or token count.

What To Consider When Choosing a Provider

  • Configuration: Rerankers refine an existing candidate set; they don't replace retrieval. Pair Cohere Rerank 4 Pro with an embedding model, BM25, or hybrid retriever for the first pass. The per-document context of 32K tokens covers query and document tokens together, so very long documents may need chunking before they reach the model.
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Cohere Rerank 4 Pro

Best For

  • High-quality enterprise search: Top-k selection drives downstream outcomes in legal, financial, and support contexts
  • Complex query reasoning: Multi-part and conditional queries benefit from full cross-attention
  • Global multilingual RAG: One reranker covers more than 100 languages including cross-lingual matching
  • Long-document corpora: Chunked passages get precise final ordering before they hit the LLM context
  • Hybrid retrieval merging: BM25 and vector candidates share a single relevance score

Consider Alternatives When

  • Latency-critical pipelines: rerank-v4-fast is tuned for lower response time and higher throughput
  • English-only corpora: rerank-v3.5 covers English RAG at a lower per-query price
  • First-pass alone is enough: The accuracy bar is met without a second stage
  • Multimodal retrieval: Pick a model with native image inputs

Conclusion

Cohere Rerank 4 Pro is the right reranker when ranking accuracy on complex multilingual queries is what the application is judged on. Route it through AI Gateway with model id cohere/rerank-v4-pro to share billing and observability with the embedding and generative models in the same RAG pipeline.

Frequently Asked Questions

  • How does Cohere Rerank 4 Pro differ from rerank-v4-fast?

    Cohere Rerank 4 Pro prioritizes ranking quality on complex queries, while rerank-v4-fast is tuned for lower latency and higher throughput. Both share the same multilingual coverage of 100+ languages and the same per-document context of 32K tokens.

  • When does Cohere Rerank 4 Pro outperform an embedding model alone?

    On multi-part queries, queries with conditions, and queries where relevance hinges on a single phrase inside a long document. Cross-attention between the query and the full document text picks up signal that bi-encoder embedding similarity flattens.

  • Which document types does Cohere Rerank 4 Pro support?

    Long-form text, semi-structured JSON, tables, code, and email-style records. The per-document context window is 32K tokens, shared between query and document tokens.

  • Does Cohere Rerank 4 Pro handle multilingual search?

    Yes. It covers more than 100 languages and supports cross-lingual matching, so a query in one language can rank documents written in another within the same call.

  • How is Cohere Rerank 4 Pro billed on AI Gateway?

    Reranking is priced per search query rather than per token. Cost scales with the number of rerank calls rather than document length. See the pricing section on this page for the current rate.

  • How does Cohere Rerank 4 Pro compare to rerank-v3.5?

    rerank-v3.5 is the December 2024 English-focused reranker. Cohere Rerank 4 Pro is the December 2025 multilingual quality tier with broader language coverage and stronger reasoning on complex queries.

  • Does Cohere Rerank 4 Pro support Zero Data Retention?

    Zero Data Retention is not currently available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.