Nemotron 3 Ultra

Nemotron 3 Ultra is NVIDIA's largest open reasoning model, a hybrid Mamba-Transformer MoE with 550B total and 55B active parameters, latent MoE routing, multi-token prediction, and a context window of 1M tokens for long-running agent workflows.

ReasoningTool UseImplicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'nvidia/nemotron-3-ultra-550b-a55b',
  prompt: 'Why is the sky blue?'
})

Overview About Providers Throughput Latency Uptime Status Similar FAQ

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Max Output	Latency	Throughput	Input	Output	Cache	Web Search	Capabilities	ZDR	No Training	Release Date

Together AI

Legal:Terms

•

Privacy

65K

0.2s

165tps

$0.60/M

$3.60/M

Read:$0.2/M

Write:—

—

06/04/2026

DeepInfra

Legal:Terms

•

Privacy

262K

65K

1.2s

36tps

$0.50/M

$2.50/M

Read:$0.15/M

Write:—

—

06/04/2026

Blackbox AI

Legal:Terms

•

Privacy

65K

2.9s

100tps

$0.37/M

$1.08/M

—

06/04/2026

Baseten

Legal:Terms

•

Privacy

65K

0.3s

146tps

$0.60/M

$2.40/M

Read:$0.12/M

Write:—

—

06/04/2026

Agent Stack

Core Platform

Tools

Learn

Build

Explore

Nemotron 3 Ultra

Providers