Cohere Rerank 4 Fast

Cohere Rerank 4 Fast is a multilingual reranking model from Cohere tuned for low-latency, high-throughput retrieval over English and non-English documents and semi-structured JSON.

Rerank

index.ts

import { rerank } from 'ai';

const result = await rerank({
  model: 'cohere/rerank-v4-fast',
  query: 'What is the capital of France?',
  documents: [
    'Paris is the capital of France.',
    'Berlin is the capital of Germany.',
    'Madrid is the capital of Spain.',
  ],
})

Overview About Providers Similar FAQ

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Input	ZDR	No Training	Release Date

Cohere

Legal:Terms

•

Privacy

32K

$2/K

12/11/2025

More models by Cohere

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Capabilities	Providers	ZDR	No Training	Release Date

cohere/rerank-v4-pro

32K

$2.50/K

—

12/11/2025

cohere/embed-v4.0

128K

$0.12/M

—

04/15/2025

cohere/command-a

256K

0.2s

73tps

$2.50/M

$10/M

—

03/13/2025

cohere/rerank-v3.5

$2/K

—

12/02/2024

About Cohere Rerank 4 Fast

Cohere Rerank 4 Fast is the fast tier of Cohere's Rerank 4 generation, released December 11, 2025 alongside rerank-v4-pro. It shares the v4 family's multilingual coverage and JSON support but is tuned for lower per-query latency and higher throughput.

Like its siblings, Cohere Rerank 4 Fast is a cross-encoder. It reads the query and each candidate document together through attention, scoring relevance directly rather than relying on independent vector similarity. That structure picks up signal on complex, multi-part, or ambiguous queries that bi-encoder embeddings flatten.

Cohere Rerank 4 Fast covers more than 100 languages, including the same multilingual coverage as Cohere's embed-multilingual family. A French query can match Japanese documents and vice versa under one model. It handles long-form text, tables, code, and semi-structured JSON records with the same per-document context window of 32K tokens.

The fast variant fits at the front of high-traffic search systems and agentic retrieval steps where the reranker runs on every user turn. It pairs naturally with a multilingual embedding retriever for first-pass candidate selection and trades some quality versus rerank-v4-pro for response time under load.

See https://cohere.com/blog/rerank-4 for the request format, including top_n, return_documents, and per-document tokenization rules. Reranking is billed per search query rather than per token.

What To Consider When Choosing a Provider

Configuration: Rerankers run after an initial retrieval step. Pair Cohere Rerank 4 Fast with an embedding model, keyword search, or hybrid retriever so it has a candidate pool to score. The per-document context of 32K tokens covers query and document tokens together, which is wide enough to score long passages and chunked enterprise documents without truncation in most cases.
Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Cohere Rerank 4 Fast

Best for

Multilingual reranking: One model covers 100+ languages for cross-lingual retrieval
High-throughput search: Lower latency keeps reranker overhead small under load
Agentic retrieval loops: Tool calls that retrieve then rerank on every turn stay within tight time budgets
Mixed-format corpora: Long text, JSON, tables, and code share one reranking stage
Cost-sensitive RAG: A lower per-query price than the Pro variant at scale

Consider alternatives when

Maximum quality: rerank-v4-pro targets state-of-the-art relevance on complex queries
English-only corpora: rerank-v3.5 covers English RAG at a lower price point
No second stage needed: First-pass retrieval alone meets the accuracy bar
Image or multimodal retrieval: Use a multimodal retrieval model instead

Conclusion

Cohere Rerank 4 Fast is the right reranker when multilingual coverage and latency both matter, and a small quality gap versus the Pro variant is acceptable. Route it through AI Gateway with model id cohere/rerank-v4-fast for unified billing across the retrieval and generation stages of the same pipeline.

Agent Stack

Core Platform

Tools

Learn

Build

Explore