Skip to content

Nemotron 3 Ultra

A 550B parameter (55B active) open reasoning model from NVIDIA, built for long-running agent workflows. It uses a hybrid Mamba-Transformer MoE architecture and supports a 1M token context window.

ReasoningTool UseImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'nvidia/nemotron-3-ultra-550b-a55b',
prompt: 'Why is the sky blue?'
})

More models by NVIDIA

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
256K
1.6s
532tps
$0.15/M$0.65/M
Read:
Write:$0.06/M
baseten logo
bedrock logo
03/18/2026
131K
0.2s
116tps
$0.06/M$0.23/M
bedrock logo
deepinfra logo
08/18/2025
262K
0.3s
91tps
$0.05/M$0.24/M
deepinfra logo
12/01/2024
131K
0.2s
$0.20/M$0.60/M
+1
bedrock logo
deepinfra logo
12/01/2024