How does the reasoning trace change what I get back from the API?

You get two parts: a thinking section with the chain-of-thought trace, and a final answer section with the conclusion. The trace shows problem decomposition, intermediate steps, considered alternatives, and the logical path to the answer. Both sections count toward output token usage.

What kinds of problems benefit most from the thinking mode?

Multi-step proofs, debugging where the root cause isn't immediately apparent, algorithmic optimization with competing approaches, and problems where the model needs to try and discard wrong paths. Simple factual questions and routine code generation often don't justify the added cost and latency.

How long are the reasoning traces in practice?

Length varies with problem difficulty. A moderately complex coding problem might produce 500 to 1,000 tokens of reasoning. A hard mathematical proof or multi-step debugging session can generate 3,000 to 5,000+ tokens. The model scales its deliberation to the perceived difficulty of the task.

Can I use reasoning traces for model evaluation and quality assurance?

Yes. Traces show where the model reasons correctly, where it makes assumptions, and where it backtracks. You can check whether the model reached a correct answer through step-by-step reasoning or pattern matching, which helps on domain-specific tasks.

What makes long tool-call chains important for reasoning workflows?

Each tool call is a reasoning decision: the model decides what to call, interprets the result, and picks the next step. Long chains let the model keep coherent task reasoning across more steps than many models support, so you can run automation pipelines that would otherwise need multiple sessions.

Does K2 Thinking always produce a reasoning trace, or can I turn it off?

It always produces a reasoning trace. For direct answers without traces, use standard Kimi K2 or Kimi K2-0905. They share the same K2 architecture without the deliberative reasoning layer.

Kimi K2 Thinking

Kimi K2 Thinking adds extended chain-of-thought (CoT) reasoning to the K2 architecture, supporting many sequential tool calls for agentic workflows through AI Gateway.

ReasoningTool UseImplicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'moonshotai/kimi-k2-thinking',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out Kimi K2 Thinking by Moonshot AI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

262K

1.1s

22tps

$0.60/M

$2.50/M

Read:$0.15/M

Write:—

—

11/06/2025

Legal:Terms

•

Privacy

216K

0.6s

56tps

$0.47/M

$2.00/M

Read:$0.14/M

Write:—

—

11/06/2025

More models by Moonshot AI

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

262K

1.4s

79tps

$0.95/M

$4.00/M

Read:$0.16/M

Write:—

—

04/20/2026

262K

0.4s

62tps

$0.50/M

$2.80/M

Read:$0.1/M

Write:—

—

01/26/2026

262K

0.8s

118tps

$1.15/M

$8.00/M

Read:$0.15/M

Write:—

—

11/06/2025

131K

1.4s

24tps

$0.57/M

$2.30/M

—

09/05/2025

256K

0.7s

69tps

$1.15/M

$8.00/M

Read:$0.15/M

Write:—

—

09/05/2025

About Kimi K2 Thinking

Standard language models produce answers directly. Input goes in, output comes out, and whatever reasoning occurred stays invisible. Kimi K2 Thinking changes the output structure. Before generating its final answer, the model produces an explicit chain-of-thought (CoT) trace: a written record of how it decomposes the problem, what options it considers, and how it reaches its conclusion.

This isn't a prompting trick. The thinking behavior is trained into the model. When K2 Thinking encounters a hard problem, its reasoning trace can run for hundreds or thousands of tokens as the model works through sub-problems, backtracks from dead ends, and synthesizes intermediate results. The final answer follows the trace.

Two practical consequences follow. First, step-by-step decomposition helps on problems that benefit from it: multi-step mathematical proofs, algorithmic design, and debugging sessions where the root cause isn't obvious. Second, the reasoning trace is also an output you can log, audit, or use in evaluations.

K2 Thinking supports long chains of sequential tool calls within a single agentic session. The model reasons about what tool to call next, observes the result, reasons about the implications, and continues. It maintains coherent task state across more interaction steps than many non-thinking models handle.

The model is open source under Moonshot AI's license terms.

Kimi K2 Thinking is available through AI Gateway at $0.60 per million input tokens and $2.50 per million output tokens.

What To Consider When Choosing a Provider

Configuration: Reasoning traces increase output length, so budget planning should account for higher output token use relative to non-thinking K2 variants. Completions support up to 262.1K tokens per request.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Kimi K2 Thinking

Best For

Visible model reasoning: Problems where seeing the model's work matters — debugging complex logic, validating mathematical derivations, auditing decisions
Algorithmic exploration: Multi-step design where the model must explore and eliminate approaches before settling on a solution
Long tool-call chains: Agentic sessions requiring sequential tool calls with coherence across the full chain
Evaluation and red-teaming: Workflows where reasoning traces surface failure modes and edge cases

Consider Alternatives When

Straightforward tasks: Standard Kimi K2 is faster and cheaper for direct-answer tasks that don't benefit from deliberation
Hard latency constraints: Reasoning traces add significant generation time
Output cost sensitivity: Thinking traces can multiply output length by 3 to 10x
Speed-optimized reasoning: Kimi K2 Thinking Turbo trades some reasoning depth for lower latency

Conclusion

Kimi K2 Thinking restructures model output around explicit reasoning. For problems that reward deliberation, the visible chain-of-thought adds an auditable record of the model's logic. Long chains of sequential tool calls extend this into agentic workflows. Reserve it for tasks where the thinking trace earns its token cost. Use non-thinking variants for everything else.

Frequently Asked Questions

How does the reasoning trace change what I get back from the API?
You get two parts: a thinking section with the chain-of-thought trace, and a final answer section with the conclusion. The trace shows problem decomposition, intermediate steps, considered alternatives, and the logical path to the answer. Both sections count toward output token usage.
What kinds of problems benefit most from the thinking mode?
Multi-step proofs, debugging where the root cause isn't immediately apparent, algorithmic optimization with competing approaches, and problems where the model needs to try and discard wrong paths. Simple factual questions and routine code generation often don't justify the added cost and latency.
How long are the reasoning traces in practice?
Length varies with problem difficulty. A moderately complex coding problem might produce 500 to 1,000 tokens of reasoning. A hard mathematical proof or multi-step debugging session can generate 3,000 to 5,000+ tokens. The model scales its deliberation to the perceived difficulty of the task.
Can I use reasoning traces for model evaluation and quality assurance?
Yes. Traces show where the model reasons correctly, where it makes assumptions, and where it backtracks. You can check whether the model reached a correct answer through step-by-step reasoning or pattern matching, which helps on domain-specific tasks.
What makes long tool-call chains important for reasoning workflows?
Each tool call is a reasoning decision: the model decides what to call, interprets the result, and picks the next step. Long chains let the model keep coherent task reasoning across more steps than many models support, so you can run automation pipelines that would otherwise need multiple sessions.
Does K2 Thinking always produce a reasoning trace, or can I turn it off?
It always produces a reasoning trace. For direct answers without traces, use standard Kimi K2 or Kimi K2-0905. They share the same K2 architecture without the deliberative reasoning layer.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Kimi K2 Thinking

Playground

Providers

More models by Moonshot AI

About Kimi K2 Thinking

What To Consider When Choosing a Provider

When to Use Kimi K2 Thinking

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions