Skip to content

NVIDIA Nemotron 3 Super 120B A12B

NVIDIA Nemotron 3 Super 120B A12B is NVIDIA's 120B total, 12B active-parameter hybrid Mamba-Transformer MoE built for complex multi-agent applications, featuring latent MoE and multi-token prediction.

index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'nvidia/nemotron-3-super-120b-a12b',
prompt: 'Why is the sky blue?'
})

About NVIDIA Nemotron 3 Super 120B A12B

NVIDIA released NVIDIA Nemotron 3 Super 120B A12B on March 18, 2026 as the second model in the Nemotron 3 family, following Nano. It has 120B total parameters and 12B active parameters per token. The hybrid Mamba-Transformer MoE backbone interleaves Mamba-2 layers for long-sequence processing, Transformer attention layers for precise recall, and MoE layers for compute efficiency. NVIDIA Nemotron 3 Super 120B A12B delivers higher throughput than the previous Nemotron Super generation.

Two architectural innovations distinguish Super from Nano. First, latent MoE: before routing, token embeddings compress into a low-rank latent space. This lets the model consult 4x as many expert specialists at the same inference cost. Finer-grained routing allows distinct experts to activate for different subtasks (Python syntax, SQL logic, multi-hop reasoning) without paying the compute cost of running them all. Second, multi-token prediction (MTP): the model predicts multiple future tokens in a single forward pass. MTP strengthens reasoning during training and provides built-in speculative decoding at inference, yielding up to 3x speedups on structured generation tasks like code and tool calls.

On PinchBench (a benchmark evaluating LLMs as the planning brain of an OpenClaw agent), NVIDIA Nemotron 3 Super 120B A12B scores 85.6%. Full announcement: https://docs.aws.amazon.com/en_us/bedrock/latest/userguide/model-card-nvidia-nemotron-super-3-120b.html.