Skip to content

Cohere Rerank 3.5

Cohere Rerank 3.5 is a reranking model from Cohere that reorders retrieved English documents and semi-structured JSON by semantic relevance to a query, sharpening the top-k results in a RAG pipeline.

Rerank
index.ts
import { rerank } from 'ai';
const result = await rerank({
model: 'cohere/rerank-v3.5',
query: 'What is the capital of France?',
documents: [
'Paris is the capital of France.',
'Berlin is the capital of Germany.',
'Madrid is the capital of Spain.',
],
})

About Cohere Rerank 3.5

Cohere Rerank 3.5 is the December 2024 update to the Cohere Rerank family. Cohere positions it as a reasoning-focused reranker for complex enterprise search where the difference between the right document and a near-miss matters.

Reranking is a cross-encoder step that runs after first-pass retrieval. A vector search, BM25 index, or hybrid retriever returns a candidate set, then Cohere Rerank 3.5 scores each candidate against the query and returns a relevance-ordered list. Cross-attention between the query and the full document text picks up signal that bi-encoder embedding similarity misses on under-specified or multi-part queries.

Cohere Rerank 3.5 handles English long-form text and semi-structured data including JSON, tables, and email-style records. The per-document context window is 4.1K tokens tokens, shared between query and document. Documents that exceed the limit are chunked automatically and the highest-scoring chunk drives the document's final rank.

In a RAG pipeline, the common pattern is to retrieve 50 to 200 candidates with an embedding model and then rerank them down to the top 5 to 20 documents passed to the generative model. That reduces the prompt token count sent to the LLM while improving the quality of the context, which often offsets the reranking call's cost.

See https://aws.amazon.com/blogs/machine-learning/cohere-rerank-3-5-is-now-available-in-amazon-bedrock-through-rerank-api/ for the API contract, including how to format queries, documents, and the optional return_documents and top_n parameters.