Cohere Rerank 4 Fast
Cohere Rerank 4 Fast is a multilingual reranking model from Cohere tuned for low-latency, high-throughput retrieval over English and non-English documents and semi-structured JSON.
import { rerank } from 'ai';
const result = await rerank({ model: 'cohere/rerank-v4-fast', query: 'What is the capital of France?', documents: [ 'Paris is the capital of France.', 'Berlin is the capital of Germany.', 'Madrid is the capital of Spain.', ],})About Cohere Rerank 4 Fast
Cohere Rerank 4 Fast is the fast tier of Cohere's Rerank 4 generation, released December 11, 2025 alongside rerank-v4-pro. It shares the v4 family's multilingual coverage and JSON support but is tuned for lower per-query latency and higher throughput.
Like its siblings, Cohere Rerank 4 Fast is a cross-encoder. It reads the query and each candidate document together through attention, scoring relevance directly rather than relying on independent vector similarity. That structure picks up signal on complex, multi-part, or ambiguous queries that bi-encoder embeddings flatten.
Cohere Rerank 4 Fast covers more than 100 languages, including the same multilingual coverage as Cohere's embed-multilingual family. A French query can match Japanese documents and vice versa under one model. It handles long-form text, tables, code, and semi-structured JSON records with the same per-document context window of 32K tokens.
The fast variant fits at the front of high-traffic search systems and agentic retrieval steps where the reranker runs on every user turn. It pairs naturally with a multilingual embedding retriever for first-pass candidate selection and trades some quality versus rerank-v4-pro for response time under load.
See https://cohere.com/blog/rerank-4 for the request format, including top_n, return_documents, and per-document tokenization rules. Reranking is billed per search query rather than per token.