Skip to content

Kimi K2 Thinking

Kimi K2 Thinking adds extended chain-of-thought (CoT) reasoning to the K2 architecture, supporting many sequential tool calls for agentic workflows through AI Gateway.

ReasoningTool UseImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'moonshotai/kimi-k2-thinking',
prompt: 'Why is the sky blue?'
})

Playground

Try out Kimi K2 Thinking by Moonshot AI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Moonshot AI
Legal:Terms
Privacy
262K
1.1s
22tps
$0.60/M$2.50/M
Read:$0.15/M
Write:
11/06/2025
DeepInfra
Legal:Terms
Privacy
216K
0.6s
56tps
$0.47/M$2.00/M
Read:$0.14/M
Write:
11/06/2025
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Moonshot AI

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
262K
1.4s
79tps
$0.95/M$4.00/M
Read:$0.16/M
Write:
fireworks logo
moonshotai logo
novita logo
04/20/2026
262K
0.4s
62tps
$0.50/M$2.80/M
Read:$0.1/M
Write:
bedrock logo
fireworks logo
moonshotai logo
+2
01/26/2026
262K
0.8s
118tps
$1.15/M$8.00/M
Read:$0.15/M
Write:
moonshotai logo
11/06/2025
131K
1.4s
24tps
$0.57/M$2.30/M
novita logo
09/05/2025
256K
0.7s
69tps
$1.15/M$8.00/M
Read:$0.15/M
Write:
moonshotai logo
09/05/2025

About Kimi K2 Thinking

Standard language models produce answers directly. Input goes in, output comes out, and whatever reasoning occurred stays invisible. Kimi K2 Thinking changes the output structure. Before generating its final answer, the model produces an explicit chain-of-thought (CoT) trace: a written record of how it decomposes the problem, what options it considers, and how it reaches its conclusion.

This isn't a prompting trick. The thinking behavior is trained into the model. When K2 Thinking encounters a hard problem, its reasoning trace can run for hundreds or thousands of tokens as the model works through sub-problems, backtracks from dead ends, and synthesizes intermediate results. The final answer follows the trace.

Two practical consequences follow. First, step-by-step decomposition helps on problems that benefit from it: multi-step mathematical proofs, algorithmic design, and debugging sessions where the root cause isn't obvious. Second, the reasoning trace is also an output you can log, audit, or use in evaluations.

K2 Thinking supports long chains of sequential tool calls within a single agentic session. The model reasons about what tool to call next, observes the result, reasons about the implications, and continues. It maintains coherent task state across more interaction steps than many non-thinking models handle.

The model is open source under Moonshot AI's license terms.

Kimi K2 Thinking is available through AI Gateway at $0.60 per million input tokens and $2.50 per million output tokens.

What To Consider When Choosing a Provider

  • Configuration: Reasoning traces increase output length, so budget planning should account for higher output token use relative to non-thinking K2 variants. Completions support up to 262.1K tokens per request.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Kimi K2 Thinking

Best For

  • Visible model reasoning: Problems where seeing the model's work matters — debugging complex logic, validating mathematical derivations, auditing decisions
  • Algorithmic exploration: Multi-step design where the model must explore and eliminate approaches before settling on a solution
  • Long tool-call chains: Agentic sessions requiring sequential tool calls with coherence across the full chain
  • Evaluation and red-teaming: Workflows where reasoning traces surface failure modes and edge cases

Consider Alternatives When

  • Straightforward tasks: Standard Kimi K2 is faster and cheaper for direct-answer tasks that don't benefit from deliberation
  • Hard latency constraints: Reasoning traces add significant generation time
  • Output cost sensitivity: Thinking traces can multiply output length by 3 to 10x
  • Speed-optimized reasoning: Kimi K2 Thinking Turbo trades some reasoning depth for lower latency

Conclusion

Kimi K2 Thinking restructures model output around explicit reasoning. For problems that reward deliberation, the visible chain-of-thought adds an auditable record of the model's logic. Long chains of sequential tool calls extend this into agentic workflows. Reserve it for tasks where the thinking trace earns its token cost. Use non-thinking variants for everything else.

Frequently Asked Questions

  • How does the reasoning trace change what I get back from the API?

    You get two parts: a thinking section with the chain-of-thought trace, and a final answer section with the conclusion. The trace shows problem decomposition, intermediate steps, considered alternatives, and the logical path to the answer. Both sections count toward output token usage.

  • What kinds of problems benefit most from the thinking mode?

    Multi-step proofs, debugging where the root cause isn't immediately apparent, algorithmic optimization with competing approaches, and problems where the model needs to try and discard wrong paths. Simple factual questions and routine code generation often don't justify the added cost and latency.

  • How long are the reasoning traces in practice?

    Length varies with problem difficulty. A moderately complex coding problem might produce 500 to 1,000 tokens of reasoning. A hard mathematical proof or multi-step debugging session can generate 3,000 to 5,000+ tokens. The model scales its deliberation to the perceived difficulty of the task.

  • Can I use reasoning traces for model evaluation and quality assurance?

    Yes. Traces show where the model reasons correctly, where it makes assumptions, and where it backtracks. You can check whether the model reached a correct answer through step-by-step reasoning or pattern matching, which helps on domain-specific tasks.

  • What makes long tool-call chains important for reasoning workflows?

    Each tool call is a reasoning decision: the model decides what to call, interprets the result, and picks the next step. Long chains let the model keep coherent task reasoning across more steps than many models support, so you can run automation pipelines that would otherwise need multiple sessions.

  • Does K2 Thinking always produce a reasoning trace, or can I turn it off?

    It always produces a reasoning trace. For direct answers without traces, use standard Kimi K2 or Kimi K2-0905. They share the same K2 architecture without the deliberative reasoning layer.