Skip to content

Cohere Rerank 4 Fast

Cohere Rerank 4 Fast is a multilingual reranking model from Cohere tuned for low-latency, high-throughput retrieval over English and non-English documents and semi-structured JSON.

Rerank
index.ts
import { rerank } from 'ai';
const result = await rerank({
model: 'cohere/rerank-v4-fast',
query: 'What is the capital of France?',
documents: [
'Paris is the capital of France.',
'Berlin is the capital of Germany.',
'Madrid is the capital of Spain.',
],
})

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Cohere
Legal:Terms
Privacy
32K
$2/K
12/11/2025

More models by Cohere

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
32K
$2.5/K
cohere logo
12/11/2025
$0.12/M
cohere logo
04/15/2025
256K
0.2s
75tps
$2.50/M$10.00/M
cohere logo
03/13/2025
4K
$2/K
bedrock logo
12/02/2024

About Cohere Rerank 4 Fast

Cohere Rerank 4 Fast is the fast tier of Cohere's Rerank 4 generation, released December 11, 2025 alongside rerank-v4-pro. It shares the v4 family's multilingual coverage and JSON support but is tuned for lower per-query latency and higher throughput.

Like its siblings, Cohere Rerank 4 Fast is a cross-encoder. It reads the query and each candidate document together through attention, scoring relevance directly rather than relying on independent vector similarity. That structure picks up signal on complex, multi-part, or ambiguous queries that bi-encoder embeddings flatten.

Cohere Rerank 4 Fast covers more than 100 languages, including the same multilingual coverage as Cohere's embed-multilingual family. A French query can match Japanese documents and vice versa under one model. It handles long-form text, tables, code, and semi-structured JSON records with the same per-document context window of 32K tokens.

The fast variant fits at the front of high-traffic search systems and agentic retrieval steps where the reranker runs on every user turn. It pairs naturally with a multilingual embedding retriever for first-pass candidate selection and trades some quality versus rerank-v4-pro for response time under load.

See https://cohere.com/blog/rerank-4 for the request format, including top_n, return_documents, and per-document tokenization rules. Reranking is billed per search query rather than per token.

What To Consider When Choosing a Provider

  • Configuration: Rerankers run after an initial retrieval step. Pair Cohere Rerank 4 Fast with an embedding model, keyword search, or hybrid retriever so it has a candidate pool to score. The per-document context of 32K tokens covers query and document tokens together, which is wide enough to score long passages and chunked enterprise documents without truncation in most cases.
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Cohere Rerank 4 Fast

Best For

  • Multilingual reranking: One model covers 100+ languages for cross-lingual retrieval
  • High-throughput search: Lower latency keeps reranker overhead small under load
  • Agentic retrieval loops: Tool calls that retrieve then rerank on every turn stay within tight time budgets
  • Mixed-format corpora: Long text, JSON, tables, and code share one reranking stage
  • Cost-sensitive RAG: A lower per-query price than the Pro variant at scale

Consider Alternatives When

  • Maximum quality: rerank-v4-pro targets state-of-the-art relevance on complex queries
  • English-only corpora: rerank-v3.5 covers English RAG at a lower price point
  • No second stage needed: First-pass retrieval alone meets the accuracy bar
  • Image or multimodal retrieval: Use a multimodal retrieval model instead

Conclusion

Cohere Rerank 4 Fast is the right reranker when multilingual coverage and latency both matter, and a small quality gap versus the Pro variant is acceptable. Route it through AI Gateway with model id cohere/rerank-v4-fast for unified billing across the retrieval and generation stages of the same pipeline.

Frequently Asked Questions

  • How does Cohere Rerank 4 Fast differ from rerank-v4-pro?

    Cohere Rerank 4 Fast is the latency-optimized variant in the Rerank 4 family, tuned for low-latency, high-throughput use cases. rerank-v4-pro targets the highest relevance quality on complex queries. Both share the same multilingual coverage and per-document context of 32K tokens.

  • Which languages does Cohere Rerank 4 Fast support?

    More than 100 languages, with the same multilingual coverage as Cohere's embed-multilingual family. Cross-lingual queries work in one call, so a query in one language can match documents in another.

  • What document types can Cohere Rerank 4 Fast rerank?

    Long-form text, semi-structured JSON, tables, code, and email-style records. The per-document context window is 32K tokens, shared between the query and the document.

  • How is Cohere Rerank 4 Fast billed on AI Gateway?

    Reranking is priced per search query rather than per token. See the pricing section on this page for the current per-query rate.

  • Do I still need an embedding model with Cohere Rerank 4 Fast?

    Yes, for the first-pass retrieval step. Cohere Rerank 4 Fast scores a candidate set against the query; it does not retrieve from the full corpus. A common pattern is to retrieve 50 to 200 candidates with an embedding model, then rerank them down to a top-k of 5 to 20.

  • How does Cohere Rerank 4 Fast compare to rerank-v3.5?

    rerank-v3.5 targets English and matches embed-multilingual-v3.0 coverage. Cohere Rerank 4 Fast is part of the Rerank 4 generation, explicitly multilingual across 100+ languages, and tuned for lower latency than rerank-v4-pro.

  • Does Cohere Rerank 4 Fast support Zero Data Retention?

    Zero Data Retention is not currently available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.