Skip to content

Nemotron 3 Ultra

A 550B parameter (55B active) open reasoning model from NVIDIA, built for long-running agent workflows. It uses a hybrid Mamba-Transformer MoE architecture and supports a 1M token context window.

ReasoningTool UseImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'nvidia/nemotron-3-ultra-550b-a55b',
prompt: 'Why is the sky blue?'
})

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Together AI
Legal:Terms
Privacy
1M
0.4s
178tps
$0.60/M$3.60/M
Read:$0.2/M
Write:
+1
06/04/2026
DeepInfra
Legal:Terms
Privacy
262K
2.0s
30tps
$0.50/M$2.50/M
Read:$0.15/M
Write:
+1
06/04/2026
Blackbox
Legal:Terms
Privacy
1M
1.5s
55tps
$0.37/M$1.08/M
+1
06/04/2026