Nova Micro
Nova Micro delivers text-only inference at high throughput with per-token pricing below multimodal Nova models in the same generation, purpose-built for latency-sensitive applications at scale.
import { streamText } from 'ai'
const result = streamText({ model: 'amazon/nova-micro', prompt: 'Why is the sky blue?'})Playground
Try out Nova Micro by Amazon. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by Amazon
| Model |
|---|
About Nova Micro
Nova Micro launched alongside the rest of the first-generation Nova family, but its design philosophy differs. Nova Lite and Nova Pro layer in image and video understanding. Nova Micro drops multimodal support entirely. The result is a model that does one thing, text processing, at high speed and low cost.
The tradeoff is intentional. Removing vision processing frees up architecture for generation throughput within a context window of 128K tokens. Even high-volume classification or tagging pipelines have low per-request costs.
Many teams deploy Nova Micro as the default tier in a routing architecture. Straightforward text requests (classification, entity extraction, simple Q&A, short summaries) go to Micro. Only when a request involves images, requires deep reasoning, or exceeds the context of 128K tokens does it escalate to Lite, Pro, or a second-generation model. This pattern keeps average cost per request low while covering the full range of task complexity.
What To Consider When Choosing a Provider
- Configuration: If streaming throughput matters, test Nova Micro early. It's built for high text throughput within AI Gateway.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Nova Micro
Best For
- Chatbots and conversational interfaces: Interactive experiences where response speed affects UX
- Text classification at scale: Sentiment analysis and entity extraction across high request volumes
- Autocomplete and suggestions: Inline features where generation speed matters
- Default routing tier: Handle routine text requests at minimal cost in a model-routing architecture
Consider Alternatives When
- Multimodal inputs required: Switch to Nova Lite or Nova Pro when inputs include images, documents, or video
- Structured multi-step reasoning: Nova 2 Lite is better equipped for agentic tool use and deeper reasoning
- Context beyond 128K tokens: Long documents or extended conversations require a model with a larger context window
Conclusion
Nova Micro is the speed specialist of the Nova family. By focusing exclusively on text, it achieves throughput and pricing that suit many text-only workloads where response latency matters. Pair it with a routing layer that escalates to multimodal or reasoning-capable models when needed.
Frequently Asked Questions
Why would I choose Nova Micro over Nova Lite for text tasks?
Nova Micro is priced below Nova Lite's multimodal rate and is tuned for speed on pure text. If you never send images or video, Micro is usually the cheaper fit.
Can Nova Micro handle long documents?
Keep the prompt within 128K tokens. If you exceed that, split the document, summarize in chunks, or switch to Nova 2 Lite for a 1M-token window.
Is Nova Micro good for structured output like JSON?
Yes. It follows instructions well for classification, tagging, and structured extraction. Its speed makes it especially efficient for pipelines that process many short requests.
What is the maximum output length?
Nova Micro generates up to 8.2K tokens per response.
How does Nova Micro perform on reasoning benchmarks?
Nova Micro isn't designed for complex reasoning. It excels at speed and cost efficiency for routine language tasks. For reasoning-heavy workloads, consider Nova 2 Lite or Nova Pro.
Do I need separate AWS credentials?
No. AI Gateway handles authentication with Amazon Bedrock. You only need a gateway API key or OIDC token.