Cohere Rerank 4 Fast
Cohere Rerank 4 Fast is a multilingual reranking model from Cohere tuned for low-latency, high-throughput retrieval over English and non-English documents and semi-structured JSON.
import { rerank } from 'ai';
const result = await rerank({ model: 'cohere/rerank-v4-fast', query: 'What is the capital of France?', documents: [ 'Paris is the capital of France.', 'Berlin is the capital of Germany.', 'Madrid is the capital of Spain.', ],})Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
More models by Cohere
| Model |
|---|
About Cohere Rerank 4 Fast
Cohere Rerank 4 Fast is the fast tier of Cohere's Rerank 4 generation, released December 11, 2025 alongside rerank-v4-pro. It shares the v4 family's multilingual coverage and JSON support but is tuned for lower per-query latency and higher throughput.
Like its siblings, Cohere Rerank 4 Fast is a cross-encoder. It reads the query and each candidate document together through attention, scoring relevance directly rather than relying on independent vector similarity. That structure picks up signal on complex, multi-part, or ambiguous queries that bi-encoder embeddings flatten.
Cohere Rerank 4 Fast covers more than 100 languages, including the same multilingual coverage as Cohere's embed-multilingual family. A French query can match Japanese documents and vice versa under one model. It handles long-form text, tables, code, and semi-structured JSON records with the same per-document context window of 32K tokens.
The fast variant fits at the front of high-traffic search systems and agentic retrieval steps where the reranker runs on every user turn. It pairs naturally with a multilingual embedding retriever for first-pass candidate selection and trades some quality versus rerank-v4-pro for response time under load.
See https://cohere.com/blog/rerank-4 for the request format, including top_n, return_documents, and per-document tokenization rules. Reranking is billed per search query rather than per token.
What To Consider When Choosing a Provider
- Configuration: Rerankers run after an initial retrieval step. Pair Cohere Rerank 4 Fast with an embedding model, keyword search, or hybrid retriever so it has a candidate pool to score. The per-document context of 32K tokens covers query and document tokens together, which is wide enough to score long passages and chunked enterprise documents without truncation in most cases.
- Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Cohere Rerank 4 Fast
Best For
- Multilingual reranking: One model covers 100+ languages for cross-lingual retrieval
- High-throughput search: Lower latency keeps reranker overhead small under load
- Agentic retrieval loops: Tool calls that retrieve then rerank on every turn stay within tight time budgets
- Mixed-format corpora: Long text, JSON, tables, and code share one reranking stage
- Cost-sensitive RAG: A lower per-query price than the Pro variant at scale
Consider Alternatives When
- Maximum quality:
rerank-v4-protargets state-of-the-art relevance on complex queries - English-only corpora:
rerank-v3.5covers English RAG at a lower price point - No second stage needed: First-pass retrieval alone meets the accuracy bar
- Image or multimodal retrieval: Use a multimodal retrieval model instead
Conclusion
Cohere Rerank 4 Fast is the right reranker when multilingual coverage and latency both matter, and a small quality gap versus the Pro variant is acceptable. Route it through AI Gateway with model id cohere/rerank-v4-fast for unified billing across the retrieval and generation stages of the same pipeline.
Frequently Asked Questions
How does Cohere Rerank 4 Fast differ from
rerank-v4-pro?Cohere Rerank 4 Fast is the latency-optimized variant in the Rerank 4 family, tuned for low-latency, high-throughput use cases.
rerank-v4-protargets the highest relevance quality on complex queries. Both share the same multilingual coverage and per-document context of 32K tokens.Which languages does Cohere Rerank 4 Fast support?
More than 100 languages, with the same multilingual coverage as Cohere's embed-multilingual family. Cross-lingual queries work in one call, so a query in one language can match documents in another.
What document types can Cohere Rerank 4 Fast rerank?
Long-form text, semi-structured JSON, tables, code, and email-style records. The per-document context window is 32K tokens, shared between the query and the document.
How is Cohere Rerank 4 Fast billed on AI Gateway?
Reranking is priced per search query rather than per token. See the pricing section on this page for the current per-query rate.
Do I still need an embedding model with Cohere Rerank 4 Fast?
Yes, for the first-pass retrieval step. Cohere Rerank 4 Fast scores a candidate set against the query; it does not retrieve from the full corpus. A common pattern is to retrieve 50 to 200 candidates with an embedding model, then rerank them down to a top-k of 5 to 20.
How does Cohere Rerank 4 Fast compare to
rerank-v3.5?rerank-v3.5targets English and matchesembed-multilingual-v3.0coverage. Cohere Rerank 4 Fast is part of the Rerank 4 generation, explicitly multilingual across 100+ languages, and tuned for lower latency thanrerank-v4-pro.Does Cohere Rerank 4 Fast support Zero Data Retention?
Zero Data Retention is not currently available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.