Skip to content
Vercel April 2026 security incident

Kimi K2 Turbo

moonshotai/kimi-k2-turbo

Kimi K2 Turbo is Moonshot AI's throughput-oriented K2 variant. It runs the K2 Mixture-of-Experts (MoE) architecture without thinking overhead, built for streaming interfaces, high-volume pipelines, and agentic workflows where first-token latency drives responsiveness.

Tool Use
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'moonshotai/kimi-k2-turbo',
prompt: 'Why is the sky blue?'
})

What To Consider When Choosing a Provider

  • Zero Data Retention

    AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.

    Authentication

    AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

For agentic pipelines that need the lowest first-token latency, verify provider response time benchmarks for your deployment region.

When to Use Kimi K2 Turbo

Best For

  • Real-time streaming chat:

    First-token latency drives perceived responsiveness in chat interfaces. No thinking overhead means the first token arrives sooner, which users notice immediately

  • High-frequency tool-calling agents:

    Agents that execute many sequential or parallel tool calls benefit from the per-call latency reduction. A 100-step agentic workflow is faster at turbo latency than thinking latency

  • Sub-agents in multi-agent orchestration:

    When K2 Turbo serves as a worker node in a larger agentic system, its response time affects overall orchestration throughput. Fast sub-agents keep the pipeline moving

  • Cost-sensitive high-volume production:

    Lower latency often correlates with lower compute cost at scale. Kimi K2 Turbo delivers K2-level capability at a throughput-oriented configuration

Consider Alternatives When

  • The task requires explicit reasoning steps:

    When chain-of-thought deliberation improves output quality, Kimi K2 Thinking or K2 Thinking Turbo is more appropriate

  • Complex multi-step planning is central to the workflow:

    Tasks where the model needs to plan before acting benefit from the thinking variants' deliberation budget

  • You're building a reasoning benchmark or evaluation:

    Reasoning benchmarks that reward explicit deliberation will show different scores from thinking-enabled variants

Conclusion

Kimi K2 Turbo is the right K2 configuration when speed is the binding constraint and chain-of-thought is overhead rather than a benefit. For streaming interfaces, high-frequency agents, and latency-sensitive pipelines, it delivers K2-generation capability at high throughput.

FAQ

It drops the extended thinking overhead in K2 Thinking and K2 Thinking Turbo, so generation goes to token output. Check this page for live throughput and latency figures.

Yes. Multi-step tool calling, parallel function execution, and long-horizon tool-use sequences all run in turbo mode without thinking overhead.

Moonshot AI launched Kimi K2 Turbo on September 5, 2025. Moonshot AI publishes release notes on https://www.moonshot.ai. See https://moonshotai.github.io/Kimi-K2/ for AI Gateway pricing, routing, and limits.

Both operate at turbo latency, but Kimi K2 Turbo has no thinking mode. Responses generate directly without deliberation. K2 Thinking Turbo keeps a compressed thinking budget and emits chain-of-thought reasoning at turbo speed. Use Turbo when reasoning isn't needed. Use Thinking Turbo when deliberation improves quality.

256K tokens, consistent with the K2 family.

Use the identifier moonshotai/kimi-k2-turbo with any supported interface. AI Gateway manages provider routing automatically.