Skip to content

Nvidia Nemotron Nano 9B V2

Nvidia Nemotron Nano 9B V2 is a dense hybrid Mamba-Transformer reasoning model that matches or exceeds Qwen3-8B accuracy at up to 6x the throughput, with built-in thinking budget control.

ReasoningTool Use
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'nvidia/nemotron-nano-9b-v2',
prompt: 'Why is the sky blue?'
})

Frequently Asked Questions

  • How is this model different from Nemotron 3 Nano (30B/A3B)?

    They use different architectures. Nvidia Nemotron Nano 9B V2 is a dense 9B model with a context window of 131.1K tokens. Nemotron 3 Nano is a sparse MoE (30B total, 3B active) with a 1M-token context for multi-agent throughput. Choose based on whether your constraint is footprint (9B v2) or context scale (Nemotron 3 Nano).

  • What does thinking budget control mean in practice?

    You can instruct the model to reason briefly or in depth on a per-request basis. Brief reasoning produces faster, cheaper responses for straightforward tasks. Deep reasoning takes longer but improves accuracy on complex problems.

  • Where are input and output prices listed?

    Pricing appears on this page and updates as providers adjust their rates. AI Gateway routes traffic through the configured provider.