Nemotron 3 Ultra

Nemotron 3 Ultra is NVIDIA's largest open reasoning model, a hybrid Mamba-Transformer MoE with 550B total and 55B active parameters, latent MoE routing, multi-token prediction, and a context window of 1M tokens for long-running agent workflows.

ReasoningTool UseImplicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'nvidia/nemotron-3-ultra-550b-a55b',
  prompt: 'Why is the sky blue?'
})

Overview About Providers Throughput Latency Uptime Status Similar FAQ

More models by NVIDIA

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Capabilities	Providers	ZDR	No Training	Release Date

nvidia/nemotron-3-super-120b-a12b

256K

1.8s

365tps

$0.15/M

$0.65/M

Read:$0.06/M

Write:—

—

03/11/2026

nvidia/nemotron-3-nano-30b-a3b

262K

0.2s

80tps

$0.05/M

$0.24/M

—

12/15/2025

nvidia/nemotron-nano-12b-v2-vl

131K

0.2s

192tps

$0.20/M

$0.60/M

—

10/28/2025

nvidia/nemotron-nano-9b-v2

131K

0.2s

182tps

$0.06/M

$0.23/M

—

08/18/2025

Agent Stack

Core Platform

Tools

Learn

Build

Explore

Nemotron 3 Ultra

More models by NVIDIA