Skip to content

Nemotron 3 Ultra

A 550B parameter (55B active) open reasoning model from NVIDIA, built for long-running agent workflows. It uses a hybrid Mamba-Transformer MoE architecture and supports a 1M token context window.

ReasoningTool UseImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'nvidia/nemotron-3-ultra-550b-a55b',
prompt: 'Why is the sky blue?'
})

Playground

Try out Nemotron 3 Ultra by NVIDIA. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Together AI
Legal:Terms
Privacy
1M
0.4s
$0.60/M$3.60/M
Read:$0.2/M
Write:
+1
06/04/2026
DeepInfra
Legal:Terms
Privacy
262K
1.7s
36tps
$0.50/M$2.50/M
Read:$0.15/M
Write:
+1
06/04/2026
Blackbox
Legal:Terms
Privacy
1M
$0.37/M$1.08/M
+1
06/04/2026
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by NVIDIA

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
256K
0.2s
67tps
$0.15/M$0.65/M
Read:
Write:$0.06/M
baseten logo
bedrock logo
03/18/2026
131K
0.2s
111tps
$0.06/M$0.23/M
bedrock logo
deepinfra logo
08/18/2025
262K
0.6s
99tps
$0.05/M$0.24/M
deepinfra logo
12/01/2024
131K
0.2s
$0.20/M$0.60/M
+1
bedrock logo
deepinfra logo
12/01/2024