Mercury Coder Small Beta belongs to the Mercury family of diffusion large language models (dLLMs) from Inception Labs. Unlike transformer-based code models that emit tokens one at a time, Mercury Coder Small Beta uses a coarse-to-fine generation process. It produces a rough complete draft and refines all positions in parallel over a small number of passes. Mercury Coder Small Beta runs faster than autoregressive alternatives at comparable quality tiers. Live metrics on this page show current rates.
Mercury Coder Small Beta scores 90.0 on HumanEval and 84.8 on fill-in-the-middle (FIM) tasks. FIM maps directly to IDE autocomplete, where the model completes code surrounded by existing context on both sides. Its MBPP score of 76.6 and MultiPL-E score of 76.2 reflect results across Python-centric and multi-language coding evaluations.
The model targets high-frequency, latency-sensitive coding applications: inline completions, documentation generation triggered on keystrokes, and fast unit test synthesis. At $0.25 input / $1.0 output per million tokens, Mercury Coder Small Beta suits developers who need reliable code quality without the cost or latency overhead of frontier-scale models on every request.