Nemotron 3 Ultra

Nemotron 3 Ultra is NVIDIA's largest open reasoning model, a hybrid Mamba-Transformer MoE with 550B total and 55B active parameters, latent MoE routing, multi-token prediction, and a context window of 1M tokens for long-running agent workflows.

ReasoningTool UseImplicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'nvidia/nemotron-3-ultra-550b-a55b',
  prompt: 'Why is the sky blue?'
})

Overview About Providers Throughput Latency Uptime Status Similar FAQ

Throughput24 hours

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Agent Stack

Core Platform

Tools

Learn

Build

Explore

Nemotron 3 Ultra