Skip to content

Cohere Rerank 4 Fast

Cohere Rerank 4 Fast is a multilingual reranking model from Cohere tuned for low-latency, high-throughput retrieval over English and non-English documents and semi-structured JSON.

Rerank
index.ts
import { rerank } from 'ai';
const result = await rerank({
model: 'cohere/rerank-v4-fast',
query: 'What is the capital of France?',
documents: [
'Paris is the capital of France.',
'Berlin is the capital of Germany.',
'Madrid is the capital of Spain.',
],
})

Frequently Asked Questions

  • How does Cohere Rerank 4 Fast differ from rerank-v4-pro?

    Cohere Rerank 4 Fast is the latency-optimized variant in the Rerank 4 family, tuned for low-latency, high-throughput use cases. rerank-v4-pro targets the highest relevance quality on complex queries. Both share the same multilingual coverage and per-document context of 32K tokens.

  • Which languages does Cohere Rerank 4 Fast support?

    More than 100 languages, with the same multilingual coverage as Cohere's embed-multilingual family. Cross-lingual queries work in one call, so a query in one language can match documents in another.

  • What document types can Cohere Rerank 4 Fast rerank?

    Long-form text, semi-structured JSON, tables, code, and email-style records. The per-document context window is 32K tokens, shared between the query and the document.

  • How is Cohere Rerank 4 Fast billed on AI Gateway?

    Reranking is priced per search query rather than per token. See the pricing section on this page for the current per-query rate.

  • Do I still need an embedding model with Cohere Rerank 4 Fast?

    Yes, for the first-pass retrieval step. Cohere Rerank 4 Fast scores a candidate set against the query; it does not retrieve from the full corpus. A common pattern is to retrieve 50 to 200 candidates with an embedding model, then rerank them down to a top-k of 5 to 20.

  • How does Cohere Rerank 4 Fast compare to rerank-v3.5?

    rerank-v3.5 targets English and matches embed-multilingual-v3.0 coverage. Cohere Rerank 4 Fast is part of the Rerank 4 generation, explicitly multilingual across 100+ languages, and tuned for lower latency than rerank-v4-pro.

  • Does Cohere Rerank 4 Fast support Zero Data Retention?

    Zero Data Retention is not currently available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.