Cohere Rerank 4 Fast
Cohere Rerank 4 Fast is a multilingual reranking model from Cohere tuned for low-latency, high-throughput retrieval over English and non-English documents and semi-structured JSON.
import { rerank } from 'ai';
const result = await rerank({ model: 'cohere/rerank-v4-fast', query: 'What is the capital of France?', documents: [ 'Paris is the capital of France.', 'Berlin is the capital of Germany.', 'Madrid is the capital of Spain.', ],})Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
More models by Cohere
| Model |
|---|
About Cohere Rerank 4 Fast
Cohere Rerank 4 Fast is the fast tier of Cohere's Rerank 4 generation, released December 11, 2025 alongside rerank-v4-pro. It shares the v4 family's multilingual coverage and JSON support but is tuned for lower per-query latency and higher throughput.
Like its siblings, Cohere Rerank 4 Fast is a cross-encoder. It reads the query and each candidate document together through attention, scoring relevance directly rather than relying on independent vector similarity. That structure picks up signal on complex, multi-part, or ambiguous queries that bi-encoder embeddings flatten.
Cohere Rerank 4 Fast covers more than 100 languages, including the same multilingual coverage as Cohere's embed-multilingual family. A French query can match Japanese documents and vice versa under one model. It handles long-form text, tables, code, and semi-structured JSON records with the same per-document context window of 32K tokens.
The fast variant fits at the front of high-traffic search systems and agentic retrieval steps where the reranker runs on every user turn. It pairs naturally with a multilingual embedding retriever for first-pass candidate selection and trades some quality versus rerank-v4-pro for response time under load.
See https://cohere.com/blog/rerank-4 for the request format, including top_n, return_documents, and per-document tokenization rules. Reranking is billed per search query rather than per token.
What To Consider When Choosing a Provider
- Configuration: Rerankers run after an initial retrieval step. Pair Cohere Rerank 4 Fast with an embedding model, keyword search, or hybrid retriever so it has a candidate pool to score. The per-document context of 32K tokens covers query and document tokens together, which is wide enough to score long passages and chunked enterprise documents without truncation in most cases.
- Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Cohere Rerank 4 Fast
Best For
- Multilingual reranking: One model covers 100+ languages for cross-lingual retrieval
- High-throughput search: Lower latency keeps reranker overhead small under load
- Agentic retrieval loops: Tool calls that retrieve then rerank on every turn stay within tight time budgets
- Mixed-format corpora: Long text, JSON, tables, and code share one reranking stage
- Cost-sensitive RAG: A lower per-query price than the Pro variant at scale
Consider Alternatives When
- Maximum quality:
rerank-v4-protargets state-of-the-art relevance on complex queries - English-only corpora:
rerank-v3.5covers English RAG at a lower price point - No second stage needed: First-pass retrieval alone meets the accuracy bar
- Image or multimodal retrieval: Use a multimodal retrieval model instead
Conclusion
Cohere Rerank 4 Fast is the right reranker when multilingual coverage and latency both matter, and a small quality gap versus the Pro variant is acceptable. Route it through AI Gateway with model id cohere/rerank-v4-fast for unified billing across the retrieval and generation stages of the same pipeline.