Kimi K2
Kimi K2 is Moonshot AI's Mixture-of-Experts (MoE) language model with one trillion total parameters and 32 billion active per forward pass, a context window of 131.1K tokens, available through AI Gateway via parasail, novita.
import { streamText } from 'ai'
const result = streamText({ model: 'moonshotai/kimi-k2', prompt: 'Why is the sky blue?'})What To Consider When Choosing a Provider
Zero Data Retention
AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.Authentication
AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
K2 routes across parasail, novita. Choose it when uptime and provider redundancy matter most.
When to Use Kimi K2
Best For
Agentic pipelines:
Structured sequences of API calls, data processing, and code synthesis
Provider redundancy:
Deployments where failover across multiple providers matters most
K2 architecture baseline:
Teams evaluating the K2 architecture for the first time who want the original release
Broad knowledge at low cost:
Workloads that benefit from trillion-parameter knowledge breadth at 32B-dense inference economics
Consider Alternatives When
Chain-of-thought traces:
Kimi K2 Thinking layers extended reasoning on top of this foundation
Minimum latency:
Kimi K2 Turbo is the speed-optimized variant
September 2025 checkpoint:
Use Kimi K2-0905 for expanded context and refined agentic training
Multimodal inputs:
K2 processes text only, so reach for a vision-capable model
Conclusion
Kimi K2 established that sparse expert routing can deliver dense-model responsiveness at trillion-parameter scale. Its architecture anchors the entire K2 family of specialized variants. Routing across parasail, novita gives you automatic failover for high-availability production.
FAQ
The full 1T parameters store broad knowledge, but only ~32B activate per token via the expert router. You pay compute proportional to a 32B dense model while drawing on knowledge encoded across the entire trillion-parameter budget.
It was the first K2 variant adopted across providers, so routing across parasail, novita reflects earlier integration. Later checkpoints and variants can have narrower provider sets.
Yes. Kimi K2 accepts and produces text. Multimodal capabilities are not part of this release.
Structured multi-step sequences: invoke an API, parse the response, branch on results, call a second API, and synthesize a final output. The function-calling interface in AI Gateway maps directly to these workflows.
Yes. AI Gateway supports Bring Your Own Key for providers where you hold a direct account. BYOK requests are excluded from ZDR coverage.