Skip to content

Mercury 2

Mercury 2 is Inception's reasoning diffusion language model. It refines tokens in parallel with tunable reasoning depth, native tool use, and a context window of 128K tokens.

Tool UseReasoning
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'inception/mercury-2',
prompt: 'Why is the sky blue?'
})

About Mercury 2

Mercury 2 departs from the autoregressive strategy that defines most large language models (LLMs). Instead of producing one token at a time left to right, Mercury 2 operates on a diffusion principle. It starts with a rough draft of the full response and refines multiple tokens in parallel across a small number of steps. Mercury 2 generates faster than autoregressive approaches. Live metrics on this page show current rates.

Mercury 2 supports tunable reasoning depth. You adjust refinement steps up or down to trade latency for quality on each request. Native tool use and schema-aligned JSON output let you embed it in function-calling pipelines and structured extraction workflows without extra parsing layers.

With a context window of 128K tokens, OpenAI API compatibility, and pricing of $0.25 input / $0.75 output per million tokens, Mercury 2 fits production-scale agentic workloads where inference runs dozens of times per task. Teams building multi-step coding assistants, retrieval-augmented generation (RAG) pipelines, or real-time voice interfaces gain headroom to run more refinement iterations within a fixed latency budget.