About LongCat Flash Chat

LongCat Flash Chat is the direct-response conversational variant of Meituan's LongCat-Flash series. It answers immediately rather than generating extended internal reasoning chains. The architecture uses a zero-computation expert gating mechanism to activate 18.6 to 31.3 billion parameters per token (roughly 27B on average) from the 560B total. This keeps per-token compute aligned with active-parameter-scale pricing; see [pricing] for current rates. The full 560B parameter breadth shapes knowledge and generalization.

The design emphasizes agentic tool use and reliable instruction following across sequential steps. In practice, the model handles structured function calls consistently across many turns, maintains task state through long tool-augmented conversations, and keeps response formatting stable. These properties matter when an agent invokes tools dozens of times in a session without behavior drift.

Meituan released LongCat Flash Chat under an MIT license. Model weights are publicly available; see the upstream listing. Access it through AI Gateway with one API key.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

LongCat Flash Chat

About LongCat Flash Chat

About LongCat Flash Chat