What does Lean4 formal proof capability enable in practice?

Lean4 is a formal proof assistant that states and machine-verifies mathematical claims. LongCat Flash Thinking integrates with Lean4 at a 67.6 pass@1 rate on MiniF2F-test. It generates and verifies formal mathematical proofs, not just informal natural language arguments. This applies to theorem proving, formal verification, and rigorous mathematical research.

What is the Agentic Reasoning Framework's dual-path inference?

The model autonomously decides whether each task benefits from direct reasoning or tool invocation during the thinking process. Callers don't configure this routing. Meituan reported a 64.5% token efficiency gain in agent tool-use settings while retaining 90% task accuracy.

What are the key benchmark scores for LongCat Flash Thinking?

ARC-AGI: 50.3; LiveCodeBench: 79.4; τ²-Bench: 74.0 (reported at release); MiniF2F-test: 67.6 pass@1 on formal mathematical proof. Full tables are in the [technical post](https://github.com/meituan-longcat/LongCat-Flash-Thinking).

Is LongCat Flash Thinking open-source?

Yes. Weights and licensing are published alongside Meituan's [technical post](https://github.com/meituan-longcat/LongCat-Flash-Thinking).

LongCat Flash Thinking

LongCat Flash Thinking is Meituan's 560B MoE reasoning model. It combines Lean4 formal proof capability, agentic tool use, and an ARC-AGI score of 50.3 in a single architecture.

ReasoningTool Use

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'meituan/longcat-flash-thinking',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Frequently Asked Questions

What does Lean4 formal proof capability enable in practice?
Lean4 is a formal proof assistant that states and machine-verifies mathematical claims. LongCat Flash Thinking integrates with Lean4 at a 67.6 pass@1 rate on MiniF2F-test. It generates and verifies formal mathematical proofs, not just informal natural language arguments. This applies to theorem proving, formal verification, and rigorous mathematical research.
What is the Agentic Reasoning Framework's dual-path inference?
The model autonomously decides whether each task benefits from direct reasoning or tool invocation during the thinking process. Callers don't configure this routing. Meituan reported a 64.5% token efficiency gain in agent tool-use settings while retaining 90% task accuracy.
What are the key benchmark scores for LongCat Flash Thinking?
ARC-AGI: 50.3; LiveCodeBench: 79.4; τ²-Bench: 74.0 (reported at release); MiniF2F-test: 67.6 pass@1 on formal mathematical proof. Full tables are in the technical post.
Is LongCat Flash Thinking open-source?
Yes. Weights and licensing are published alongside Meituan's technical post.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

LongCat Flash Thinking

Frequently Asked Questions