Skip to content
Vercel April 2026 security incident

Kimi K2

moonshotai/kimi-k2

Kimi K2 is Moonshot AI's Mixture-of-Experts (MoE) language model with one trillion total parameters and 32 billion active per forward pass, a context window of 131.1K tokens, available through AI Gateway via parasail, novita.

Tool Use
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'moonshotai/kimi-k2',
prompt: 'Why is the sky blue?'
})

What To Consider When Choosing a Provider

  • Zero Data Retention

    AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.

    Authentication

    AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

K2 routes across parasail, novita. Choose it when uptime and provider redundancy matter most.

When to Use Kimi K2

Best For

  • Agentic pipelines:

    Structured sequences of API calls, data processing, and code synthesis

  • Provider redundancy:

    Deployments where failover across multiple providers matters most

  • K2 architecture baseline:

    Teams evaluating the K2 architecture for the first time who want the original release

  • Broad knowledge at low cost:

    Workloads that benefit from trillion-parameter knowledge breadth at 32B-dense inference economics

Consider Alternatives When

  • Chain-of-thought traces:

    Kimi K2 Thinking layers extended reasoning on top of this foundation

  • Minimum latency:

    Kimi K2 Turbo is the speed-optimized variant

  • September 2025 checkpoint:

    Use Kimi K2-0905 for expanded context and refined agentic training

  • Multimodal inputs:

    K2 processes text only, so reach for a vision-capable model

Conclusion

Kimi K2 established that sparse expert routing can deliver dense-model responsiveness at trillion-parameter scale. Its architecture anchors the entire K2 family of specialized variants. Routing across parasail, novita gives you automatic failover for high-availability production.

FAQ

The full 1T parameters store broad knowledge, but only ~32B activate per token via the expert router. You pay compute proportional to a 32B dense model while drawing on knowledge encoded across the entire trillion-parameter budget.

It was the first K2 variant adopted across providers, so routing across parasail, novita reflects earlier integration. Later checkpoints and variants can have narrower provider sets.

Yes. Kimi K2 accepts and produces text. Multimodal capabilities are not part of this release.

Structured multi-step sequences: invoke an API, parse the response, branch on results, call a second API, and synthesize a final output. The function-calling interface in AI Gateway maps directly to these workflows.

Yes. AI Gateway supports Bring Your Own Key for providers where you hold a direct account. BYOK requests are excluded from ZDR coverage.