Skip to content
Dashboard

Cohere Rerank 4 Fast

Cohere Rerank 4 Fast is a multilingual reranking model from Cohere tuned for low-latency, high-throughput retrieval over English and non-English documents and semi-structured JSON.

Rerank
index.ts
import { rerank } from 'ai';
const result = await rerank({
model: 'cohere/rerank-v4-fast',
query: 'What is the capital of France?',
documents: [
'Paris is the capital of France.',
'Berlin is the capital of Germany.',
'Madrid is the capital of Spain.',
],
})

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Cohere
32K
—$2/K
12/11/2025

More models by Cohere

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
32K
—$2.5/K
cohere logo
12/11/2025
128K
$0.12/M——
bedrock logo
cohere logo
04/15/2025
256K
0.2s
75tps
$2.50/M$10.00/M——
cohere logo
03/13/2025
4K
—$2/K
bedrock logo
12/02/2024

About Cohere Rerank 4 Fast

Cohere Rerank 4 Fast is the fast tier of Cohere's Rerank 4 generation, released December 11, 2025 alongside rerank-v4-pro. It shares the v4 family's multilingual coverage and JSON support but is tuned for lower per-query latency and higher throughput.

Like its siblings, Cohere Rerank 4 Fast is a cross-encoder. It reads the query and each candidate document together through attention, scoring relevance directly rather than relying on independent vector similarity. That structure picks up signal on complex, multi-part, or ambiguous queries that bi-encoder embeddings flatten.

Cohere Rerank 4 Fast covers more than 100 languages, including the same multilingual coverage as Cohere's embed-multilingual family. A French query can match Japanese documents and vice versa under one model. It handles long-form text, tables, code, and semi-structured JSON records with the same per-document context window of 32K tokens.

The fast variant fits at the front of high-traffic search systems and agentic retrieval steps where the reranker runs on every user turn. It pairs naturally with a multilingual embedding retriever for first-pass candidate selection and trades some quality versus rerank-v4-pro for response time under load.

See https://cohere.com/blog/rerank-4 for the request format, including top_n, return_documents, and per-document tokenization rules. Reranking is billed per search query rather than per token.

What To Consider When Choosing a Provider

  • Configuration: Rerankers run after an initial retrieval step. Pair Cohere Rerank 4 Fast with an embedding model, keyword search, or hybrid retriever so it has a candidate pool to score. The per-document context of 32K tokens covers query and document tokens together, which is wide enough to score long passages and chunked enterprise documents without truncation in most cases.
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Cohere Rerank 4 Fast

Best For

  • Multilingual reranking: One model covers 100+ languages for cross-lingual retrieval
  • High-throughput search: Lower latency keeps reranker overhead small under load
  • Agentic retrieval loops: Tool calls that retrieve then rerank on every turn stay within tight time budgets
  • Mixed-format corpora: Long text, JSON, tables, and code share one reranking stage
  • Cost-sensitive RAG: A lower per-query price than the Pro variant at scale

Consider Alternatives When

  • Maximum quality: rerank-v4-pro targets state-of-the-art relevance on complex queries
  • English-only corpora: rerank-v3.5 covers English RAG at a lower price point
  • No second stage needed: First-pass retrieval alone meets the accuracy bar
  • Image or multimodal retrieval: Use a multimodal retrieval model instead

Conclusion

Cohere Rerank 4 Fast is the right reranker when multilingual coverage and latency both matter, and a small quality gap versus the Pro variant is acceptable. Route it through AI Gateway with model id cohere/rerank-v4-fast for unified billing across the retrieval and generation stages of the same pipeline.