Skip to content

NVIDIA Nemotron 3 Super 120B A12B

NVIDIA Nemotron 3 Super 120B A12B is NVIDIA's 120B total, 12B active-parameter hybrid Mamba-Transformer MoE built for complex multi-agent applications, featuring latent MoE and multi-token prediction.

index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'nvidia/nemotron-3-super-120b-a12b',
prompt: 'Why is the sky blue?'
})

More models by NVIDIA

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
131K
0.2s
150tps
$0.06/M$0.23/M
bedrock logo
deepinfra logo
08/18/2025
131K
0.2s
165tps
$0.20/M$0.60/M
bedrock logo
deepinfra logo
12/01/2024
262K
0.3s
67tps
$0.05/M$0.24/M
deepinfra logo
12/01/2024