Cohere Rerank 3.5
Cohere Rerank 3.5 is a reranking model from Cohere that reorders retrieved English documents and semi-structured JSON by semantic relevance to a query, sharpening the top-k results in a RAG pipeline.
import { rerank } from 'ai';
const result = await rerank({ model: 'cohere/rerank-v3.5', query: 'What is the capital of France?', documents: [ 'Paris is the capital of France.', 'Berlin is the capital of Germany.', 'Madrid is the capital of Spain.', ],})Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
More models by Cohere
| Model |
|---|
About Cohere Rerank 3.5
Cohere Rerank 3.5 is the December 2024 update to the Cohere Rerank family. Cohere positions it as a reasoning-focused reranker for complex enterprise search where the difference between the right document and a near-miss matters.
Reranking is a cross-encoder step that runs after first-pass retrieval. A vector search, BM25 index, or hybrid retriever returns a candidate set, then Cohere Rerank 3.5 scores each candidate against the query and returns a relevance-ordered list. Cross-attention between the query and the full document text picks up signal that bi-encoder embedding similarity misses on under-specified or multi-part queries.
Cohere Rerank 3.5 handles English long-form text and semi-structured data including JSON, tables, and email-style records. The per-document context window is 4.1K tokens tokens, shared between query and document. Documents that exceed the limit are chunked automatically and the highest-scoring chunk drives the document's final rank.
In a RAG pipeline, the common pattern is to retrieve 50 to 200 candidates with an embedding model and then rerank them down to the top 5 to 20 documents passed to the generative model. That reduces the prompt token count sent to the LLM while improving the quality of the context, which often offsets the reranking call's cost.
See https://aws.amazon.com/blogs/machine-learning/cohere-rerank-3-5-is-now-available-in-amazon-bedrock-through-rerank-api/ for the API contract, including how to format queries, documents, and the optional return_documents and top_n parameters.
What To Consider When Choosing a Provider
- Configuration: Rerankers sit after an initial retrieval step, not in place of it. Pair Cohere Rerank 3.5 with an embedding model (or any first-pass search) so it has a candidate pool to score. The per-document context of 4.1K tokens covers both the query and document tokens, so very long documents need chunking before they reach the model.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Cohere Rerank 3.5
Best For
- English RAG top-k refinement: Reorder a candidate pool from vector or keyword search into a sharper top-k for the generative model
- JSON and tabular search: Score semi-structured records where field content drives the match
- Long-document corpora: Chunked passages need final ordering before they hit the LLM context window
- Complex enterprise queries: Multi-part questions benefit from cross-attention over the full document
- Hybrid retrieval merging: Unify BM25 and vector candidates under a single relevance score
Consider Alternatives When
- Multilingual workloads:
rerank-v4-fastandrerank-v4-proare tuned for English and non-English documents - Latency-critical pipelines: A lighter reranker may fit a tighter response budget at high query volume
- Strict cost ceilings: Skip reranking when first-pass retrieval already meets your accuracy bar
- Image or multimodal retrieval: Use a multimodal embedding or retrieval model instead
Conclusion
Cohere Rerank 3.5 is the right reranker when an English RAG pipeline already has a working retrieval step and the next gain has to come from better top-k selection. Route it through AI Gateway with model id cohere/rerank-v3.5 to unify billing alongside the embedding and generative models in the same pipeline.
Frequently Asked Questions
What does Cohere Rerank 3.5 actually do?
It takes a query and a list of candidate documents, scores each document for semantic relevance to the query, and returns the list reordered. The model itself doesn't retrieve; it refines results from a first-pass retriever like a vector index or BM25 search.
How does Cohere Rerank 3.5 differ from an embedding model?
Embedding models produce a vector per document and a vector per query, then similarity is computed offline. Cohere Rerank 3.5 reads the query and a candidate document together through cross-attention, which catches relevance signal that bi-encoder similarity misses on complex queries. The tradeoff is that reranking runs per-candidate at query time, so it's used on a shortlist rather than the full corpus.
What types of documents does Cohere Rerank 3.5 support?
English-language text including long-form documents, semi-structured JSON, tables, and email-style records. The per-document context window is 4.1K tokens, shared between the query and the document.
How many documents should I send per rerank call?
Typical pipelines retrieve 50 to 200 candidates from a first-pass index and rerank them down to a top-k of 5 to 20 for the generative model. The exact numbers depend on your latency budget and how noisy the first-pass retriever is.
Can I use Cohere Rerank 3.5 without a separate embedding model?
You can pair it with any retriever, including BM25 or hybrid search. An embedding model is the most common first stage, but Cohere Rerank 3.5 only needs a candidate set, not a specific retrieval method.
How much does Cohere Rerank 3.5 cost on AI Gateway?
Reranking is billed per search query. See the pricing section on this page for the current per-query rate on AI Gateway.
Does Cohere Rerank 3.5 support Zero Data Retention?
Yes, Zero Data Retention is available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.