Nova Micro
Nova Micro delivers text-only inference at high throughput with per-token pricing below multimodal Nova models in the same generation, purpose-built for latency-sensitive applications at scale.
import { streamText } from 'ai'
const result = streamText({ model: 'amazon/nova-micro', prompt: 'Why is the sky blue?'})Playground
Try out Nova Micro by Amazon. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
Ask Nova Micro anything to try it out.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by Amazon
| Model |
|---|
About Nova Micro
Nova Micro launched alongside the rest of the first-generation Nova family, but its design philosophy differs. Nova Lite and Nova Pro layer in image and video understanding. Nova Micro drops multimodal support entirely. The result is a model that does one thing, text processing, at high speed and low cost.
The tradeoff is intentional. Removing vision processing frees up architecture for generation throughput within a context window of 128K tokens. Even high-volume classification or tagging pipelines have low per-request costs.
Many teams deploy Nova Micro as the default tier in a routing architecture. Straightforward text requests (classification, entity extraction, simple Q&A, short summaries) go to Micro. Only when a request involves images, requires deep reasoning, or exceeds the context of 128K tokens does it escalate to Lite, Pro, or a second-generation model. This pattern keeps average cost per request low while covering the full range of task complexity.
What To Consider When Choosing a Provider
- Configuration: If streaming throughput matters, test Nova Micro early. It's built for high text throughput within AI Gateway.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Nova Micro
Best For
- Chatbots and conversational interfaces: Interactive experiences where response speed affects UX
- Text classification at scale: Sentiment analysis and entity extraction across high request volumes
- Autocomplete and suggestions: Inline features where generation speed matters
- Default routing tier: Handle routine text requests at minimal cost in a model-routing architecture
Consider Alternatives When
- Multimodal inputs required: Switch to Nova Lite or Nova Pro when inputs include images, documents, or video
- Structured multi-step reasoning: Nova 2 Lite is better equipped for agentic tool use and deeper reasoning
- Context beyond 128K tokens: Long documents or extended conversations require a model with a larger context window
Conclusion
Nova Micro is the speed specialist of the Nova family. By focusing exclusively on text, it achieves throughput and pricing that suit many text-only workloads where response latency matters. Pair it with a routing layer that escalates to multimodal or reasoning-capable models when needed.