GLM 4.7 FlashX
GLM 4.7 FlashX is the ultra-fast inference variant in Z.ai's GLM-4.7 generation, released January 1, 2025. Designed for the lowest latency workloads, it provides the fastest response times in the GLM-4.7 family while retaining core coding and reasoning capabilities.
import { streamText } from 'ai'
const result = streamText({ model: 'zai/glm-4.7-flashx', prompt: 'Why is the sky blue?'})What To Consider When Choosing a Provider
Zero Data Retention
AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.Authentication
AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
GLM 4.7 FlashX is the right choice when response time is the binding constraint. If quality on complex tasks matters more, step up to GLM-4.7-Flash or GLM-4.7.
Use AI Gateway to route requests by complexity. Simple extraction, classification, and short generation tasks perform well on GLM 4.7 FlashX. Route complex reasoning to higher-tier models.
At the lowest per-token cost in the 4.7 generation, GLM 4.7 FlashX is the most economical option for workloads measured in millions of daily requests.
When to Use GLM 4.7 FlashX
Best For
Real-time user-facing applications:
Sub-second response times are required for acceptable user experience
High-frequency API endpoints:
Thousands of requests per minute where latency compounds into throughput bottlenecks
Simple extraction and classification:
Tasks that need language understanding without deep reasoning
Pipeline preprocessing steps:
Steps that block downstream processing benefit from the fastest possible completion
Cost-optimized batch processing:
Extreme volume where per-token cost is the primary economic driver
Consider Alternatives When
Complex reasoning quality:
GLM-4.7 or GLM-4.7-Flash provides deeper capability for multi-step planning
Balanced speed and capability:
GLM-4.7-Flash offers a middle ground in the 4.7 generation
Speed-optimized vision:
Evaluate GLM-4.6V-Flash for multimodal processing when vision is needed
Advanced thinking modes:
GLM-5 provides multiple thinking modes and an expanded reasoning architecture
Conclusion
GLM 4.7 FlashX occupies the speed extreme of Z.ai's GLM-4.7 generation. For teams that measure success in milliseconds and process requests at massive scale, it provides the lowest-latency entry point to the 4.7 generation's improvements in coding, reasoning, and conversational quality.
FAQ
GLM 4.7 FlashX is the fastest inference tier in the GLM-4.7 generation. It provides the lowest latency, followed by GLM-4.7-Flash, then the full GLM-4.7.
It trades peak reasoning and coding depth for speed. Core capabilities are retained, but the most complex multi-step reasoning and code generation tasks will produce better results on GLM-4.7 or GLM-4.7-Flash.
Yes. All GLM-4.7 variants share the same API surface. Route simple requests to GLM 4.7 FlashX for speed and complex ones to GLM-4.7 for quality.
200K tokens.
AI Gateway provides a unified API key. No separate Z.ai account is needed. Use the model identifier to route requests. BYOK is also supported.
Real-time user-facing applications, high-frequency API calls, simple classification and extraction, and any workload where response latency is the primary constraint.
Check the pricing panel on this page for today's numbers. AI Gateway tracks rates across every provider that serves GLM 4.7 FlashX.