Cohere Rerank 3.5
Cohere Rerank 3.5 is a reranking model from Cohere that reorders retrieved English documents and semi-structured JSON by semantic relevance to a query, sharpening the top-k results in a RAG pipeline.
import { rerank } from 'ai';
const result = await rerank({ model: 'cohere/rerank-v3.5', query: 'What is the capital of France?', documents: [ 'Paris is the capital of France.', 'Berlin is the capital of Germany.', 'Madrid is the capital of Spain.', ],})Frequently Asked Questions
What does Cohere Rerank 3.5 actually do?
It takes a query and a list of candidate documents, scores each document for semantic relevance to the query, and returns the list reordered. The model itself doesn't retrieve; it refines results from a first-pass retriever like a vector index or BM25 search.
How does Cohere Rerank 3.5 differ from an embedding model?
Embedding models produce a vector per document and a vector per query, then similarity is computed offline. Cohere Rerank 3.5 reads the query and a candidate document together through cross-attention, which catches relevance signal that bi-encoder similarity misses on complex queries. The tradeoff is that reranking runs per-candidate at query time, so it's used on a shortlist rather than the full corpus.
What types of documents does Cohere Rerank 3.5 support?
English-language text including long-form documents, semi-structured JSON, tables, and email-style records. The per-document context window is 4.1K tokens, shared between the query and the document.
How many documents should I send per rerank call?
Typical pipelines retrieve 50 to 200 candidates from a first-pass index and rerank them down to a top-k of 5 to 20 for the generative model. The exact numbers depend on your latency budget and how noisy the first-pass retriever is.
Can I use Cohere Rerank 3.5 without a separate embedding model?
You can pair it with any retriever, including BM25 or hybrid search. An embedding model is the most common first stage, but Cohere Rerank 3.5 only needs a candidate set, not a specific retrieval method.
How much does Cohere Rerank 3.5 cost on AI Gateway?
Reranking is billed per search query. See the pricing section on this page for the current per-query rate on AI Gateway.
Does Cohere Rerank 3.5 support Zero Data Retention?
Yes, Zero Data Retention is available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.