Skip to content

Kimi K2 Turbo

Kimi K2 Turbo is Moonshot AI's throughput-oriented K2 variant. It runs the K2 Mixture-of-Experts (MoE) architecture without thinking overhead, built for streaming interfaces, high-volume pipelines, and agentic workflows where first-token latency drives responsiveness.

Tool Use
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'moonshotai/kimi-k2-turbo',
prompt: 'Why is the sky blue?'
})

Frequently Asked Questions

  • What is the throughput advantage of Kimi K2 Turbo over other K2 variants?

    It drops the extended thinking overhead in K2 Thinking and K2 Thinking Turbo, so generation goes to token output. Check this page for live throughput and latency figures.

  • Does Kimi K2 Turbo support tool calling without thinking mode?

    Yes. Multi-step tool calling, parallel function execution, and long-horizon tool-use sequences all run in turbo mode without thinking overhead.

  • When was Kimi K2 Turbo launched?

    Moonshot AI launched Kimi K2 Turbo on September 5, 2025. Moonshot AI publishes release notes on https://www.moonshot.ai. See https://moonshotai.github.io/Kimi-K2/ for AI Gateway pricing, routing, and limits.

  • How does Kimi K2 Turbo differ from Kimi K2 Thinking Turbo?

    Both operate at turbo latency, but Kimi K2 Turbo has no thinking mode. Responses generate directly without deliberation. K2 Thinking Turbo keeps a compressed thinking budget and emits chain-of-thought reasoning at turbo speed. Use Turbo when reasoning isn't needed. Use Thinking Turbo when deliberation improves quality.

  • What is the context window for Kimi K2 Turbo?

    256K tokens, consistent with the K2 family.

  • How do I start using Kimi K2 Turbo?

    Use the identifier moonshotai/kimi-k2-turbo with any supported interface. AI Gateway manages provider routing automatically.