Skip to content

Mercury 2

View Status

Mercury 2 is Inception's reasoning diffusion language model. It refines tokens in parallel with tunable reasoning depth, native tool use, and a context window of 128K tokens.

Tool UseReasoning
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'inception/mercury-2',
prompt: 'Why is the sky blue?'
})

What To Consider When Choosing a Provider

  • Configuration: Mercury 2's diffusion architecture generates tokens in parallel rather than sequentially. Latency differs from autoregressive models, so factor that into timeout and streaming configurations for latency-sensitive pipelines.
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Mercury 2

Best For

  • Sequential agent loops: Chains of many inference calls need low per-step latency
  • Real-time voice backends: Response delay is perceptible to end users
  • High-throughput coding assistants: Many simultaneous requests processed concurrently
  • Fast structured RAG: Retrieval summarization returned as JSON output
  • Token cost optimization: Diffusion-based parallel token refinement reduces per-token inference cost compared to autoregressive models

Consider Alternatives When

  • Very long outputs: Tasks push against the cap of 128K tokens
  • Domain-specific benchmarks: Evaluation prioritizes specific benchmarks over raw throughput
  • Token-by-token streaming: Pipeline assumes autoregressive generation patterns
  • Multimodal input required: You need image or audio input alongside text reasoning

Conclusion

Mercury 2 brings a different execution model to production reasoning workloads. Diffusion-based parallel refinement keeps throughput high while preserving tool calling, structured output, and tunable reasoning depth. If inference latency or per-call cost limits how you scale your product, use Mercury 2 on Vercel AI Gateway. Open https://ai-sdk.dev/playground/inception:mercury-2 to try it interactively.

Frequently Asked Questions

  • What makes Mercury 2 architecturally different from other reasoning models?

    It uses diffusion instead of autoregressive generation. Mercury 2 starts with a draft of the full response and refines all token positions simultaneously across iterative steps, rather than generating one token at a time left to right. That follows the same conceptual lineage as image and video diffusion models, applied to language.

  • How does tunable reasoning depth work in Mercury 2?

    You adjust the number of diffusion refinement steps at inference time. Fewer steps yield faster responses; more steps let the model converge on higher-quality answers. You match compute to task difficulty on each request.

  • What throughput does Mercury 2 achieve compared to autoregressive reasoning models?

    Mercury 2 generates faster than autoregressive approaches. Live throughput metrics appear on this page.

  • Is Mercury 2 compatible with OpenAI client libraries?

    Yes. Mercury 2 exposes an OpenAI-compatible API. Route existing codebases that use the OpenAI SDK to Mercury 2 through AI Gateway by swapping the base URL and model identifier.

  • What context length does Mercury 2 support?

    A context window of 128K tokens. That suits long document processing, extended conversation history, and multi-document retrieval tasks.

  • Does Mercury 2 support structured output for agent orchestration?

    Yes. Mercury 2 includes native schema-aligned JSON output and tool use. You can plug it into function-calling orchestration frameworks without extra parsing middleware.

  • How is Mercury 2 priced?

    This page lists the current rates. Multiple providers can serve Mercury 2, so AI Gateway surfaces live pricing rather than a single fixed figure.