Skip to content

LongCat Flash Thinking 2601

LongCat Flash Thinking 2601 is Meituan's N/A upgrade to the reasoning model series. It introduces parallel multi-path thinking, noise-resistant tool calling, and a τ²-Bench score of 88.2 with an AIME-25 score of 100.0.

Reasoning
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'meituan/longcat-flash-thinking-2601',
prompt: 'Why is the sky blue?'
})

Frequently Asked Questions

  • What is the Re-thinking Mode in LongCat Flash Thinking 2601?

    Re-thinking Mode activates multiple independent parallel reasoning paths simultaneously. A summary-synthesis stage then consolidates findings from all paths into a final answer. That structure spreads intermediate hypotheses across paths and reduces the risk of a single flawed chain dominating the output.

  • How does 2601 improve on the original LongCat Flash Thinking?

    It adds parallel multi-path reasoning, noise-injected training for robustness, and benchmark gains: τ²-Bench increased to 88.2, AIME-25 reached 100.0, IMO-AnswerBench scored 86.8, and BrowseComp reached 73.1.

  • What is noise-resistant tool calling and why does it matter?

    It means the model was trained on tool outputs that include failures, malformed payloads, and missing fields, not only clean responses. Meituan used multi-class noise during training to simulate those API conditions. The goal is steadier behavior when agents call unpredictable external services.

  • What are the key benchmark results for 2601?

    τ²-Bench: 88.2; AIME-25: 100.0; IMO-AnswerBench: 86.8; BrowseComp: 73.1. See the technical post for the published tables.

  • Is LongCat Flash Thinking 2601 open-source?

    Yes. Weights and the technical write-up are published in the technical post.

  • How does 2601 differ from LongCat Flash Chat?

    Flash Chat is a direct-response conversational model optimized for speed and tool calling without extended thinking. 2601 activates deep reasoning chains, including parallel multi-path synthesis, and suits tasks that need deliberate analysis rather than fast conversational responses.