Nemotron 3 Ultra
NVIDIA Nemotron 3 Ultra is a 550B parameter (55B activated) open reasoning model built for long-running autonomous agents handling orchestration and complex tasks across coding, deep research, and enterprise workflows. Its hybrid Mamba-Transformer MoE architecture combines Latent MoE — which calls 4 experts at the inference cost of 1 — with Multi-Token Prediction for reduced generation time on long sequences, and Token Budget support for optimal accuracy with minimum reasoning token output. The model supports a 1M token context window and is fully open under the NVIDIA Open Model License with open weights, training data, and recipes.
import { streamText } from 'ai'
const result = streamText({ model: 'nvidia/nemotron-3-ultra-550b-a55b', prompt: 'Why is the sky blue?'})P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.