Kimi K2 Turbo is the throughput-maximized configuration of Kimi K2, launched on September 5, 2025. It runs the K2 Mixture-of-Experts (MoE) architecture (1T total parameters, 32B active per forward pass) without the extended thinking layer. All generation capacity goes toward token throughput. When a reasoning model's internal monologue adds latency without adding value, Kimi K2 Turbo removes that overhead entirely.
The K2 MoE architecture keeps agentic capabilities in turbo mode: multi-step tool calling, long-horizon task management, and parallel function execution all operate at the turbo speed profile. The model handles sequences of tool invocations (query an API, process the result, call another API, synthesize a response) without triggering thinking mode. For agentic pipelines where many such sequences run in parallel or in tight loops, the per-step latency reduction compounds into wall-clock savings.
Streaming interfaces benefit from lower first-token latency. A chat interface that starts streaming tokens sooner cuts wait time before visible output. One that waits for a thinking model to finish deliberation before streaming does not. Kimi K2 Turbo targets cases where streaming latency defines the product experience.
Kimi K2 Turbo is available through AI Gateway at $1.15 per million input tokens and $8 per million output tokens. Release history from Moonshot AI may also appear on https://www.moonshot.ai.