Skip to content

Nova Micro

Nova Micro delivers text-only inference at high throughput with per-token pricing below multimodal Nova models in the same generation, purpose-built for latency-sensitive applications at scale.

index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'amazon/nova-micro',
prompt: 'Why is the sky blue?'
})

About Nova Micro

Nova Micro launched alongside the rest of the first-generation Nova family, but its design philosophy differs. Nova Lite and Nova Pro layer in image and video understanding. Nova Micro drops multimodal support entirely. The result is a model that does one thing, text processing, at high speed and low cost.

The tradeoff is intentional. Removing vision processing frees up architecture for generation throughput within a context window of 128K tokens. Even high-volume classification or tagging pipelines have low per-request costs.

Many teams deploy Nova Micro as the default tier in a routing architecture. Straightforward text requests (classification, entity extraction, simple Q&A, short summaries) go to Micro. Only when a request involves images, requires deep reasoning, or exceeds the context of 128K tokens does it escalate to Lite, Pro, or a second-generation model. This pattern keeps average cost per request low while covering the full range of task complexity.