LongCat Flash Chat
LongCat Flash Chat is Meituan's 560B Mixture-of-Experts (MoE) conversational model that activates roughly 27B parameters per token on average. It targets high-throughput agentic tool use and complex multi-step interactions under an MIT license.
import { streamText } from 'ai'
const result = streamText({ model: 'meituan/longcat-flash-chat', prompt: 'Why is the sky blue?'})Frequently Asked Questions
How does the zero-computation expert gating work in LongCat Flash Chat?
The MoE gating mechanism evaluates each input token and activates only the most relevant expert networks. It selects 18.6 to 31.3B parameters (roughly 27B on average) from the 560B total. The "zero-computation" label means the routing decision itself adds no additional inference cost.
What throughput does LongCat Flash Chat sustain in practice?
MoE dynamic activation reduces per-token compute versus a dense model of equivalent parameter count. Live throughput metrics appear on this page.
What distinguishes Flash Chat from Flash Thinking for tool-calling workflows?
Flash Chat invokes tools and replies immediately without extended internal deliberation. Flash Thinking generates reasoning chains before responding, which improves accuracy on complex tasks but increases latency and token cost. Choose Flash Chat for high-frequency tool calling where response speed is the priority.
What is the context window for LongCat Flash Chat?
It supports a context window of 128K tokens, up to 100K tokens per request. This accommodates long conversation histories, multi-document contexts, and extended agentic session transcripts in a single call.
Which benchmarks reflect Flash Chat's strengths?
Flash Chat targets agentic tool use and instruction following at high throughput. For reasoning benchmarks like ARC-AGI, formal proof, and advanced STEM, the Flash Thinking variant covers those capabilities.