Nova Micro launched alongside the rest of the first-generation Nova family, but its design philosophy differs. Nova Lite and Nova Pro layer in image and video understanding. Nova Micro drops multimodal support entirely. The result is a model that does one thing, text processing, at high speed and low cost.
The tradeoff is intentional. Removing vision processing frees up architecture for generation throughput within a context window of 128K tokens. Even high-volume classification or tagging pipelines have low per-request costs.
Many teams deploy Nova Micro as the default tier in a routing architecture. Straightforward text requests (classification, entity extraction, simple Q&A, short summaries) go to Micro. Only when a request involves images, requires deep reasoning, or exceeds the context of 128K tokens does it escalate to Lite, Pro, or a second-generation model. This pattern keeps average cost per request low while covering the full range of task complexity.