GLM 5 Turbo was released March 15, 2026 as the speed-optimized variant in Z.ai's GLM-5 generation. The GLM-5 generation introduced selectable thinking modes so you can dial reasoning depth per request, and GLM 5 Turbo makes that capability affordable at production scale.
Agentic pipelines benefit the most. Many pipeline steps don't require the full GLM-5's deliberation depth, but they do benefit from the structured thinking modes when problems get harder. GLM 5 Turbo lets you route routine steps to a lightweight thinking mode for fast responses, then escalate harder steps to a deeper mode, all within the same model and API call format.
The turbo variant also inherits GLM-5's improved long-range planning and agentic coding capabilities. Combined with the lower per-token cost and faster throughput, this makes it practical to run multi-step agent workflows that would be prohibitively expensive at full GLM-5 pricing. Through AI Gateway, GLM 5 Turbo shares the same API surface as GLM-5.