GLM 4.7 FlashX

GLM 4.7 FlashX is the ultra-fast inference variant in Z.ai's GLM-4.7 generation, released January 1, 2025. Designed for the lowest latency workloads, it provides the fastest response times in the GLM-4.7 family while retaining core coding and reasoning capabilities.

ReasoningTool UseImplicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'zai/glm-4.7-flashx',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Uptime Status Similar FAQ

About GLM 4.7 FlashX

GLM 4.7 FlashX was released January 1, 2025 as the fastest inference tier in Z.ai's GLM-4.7 generation. It targets workloads where response latency is the dominant constraint: real-time user-facing applications, high-frequency API calls, and pipeline steps that block downstream processing.

As the most aggressively speed-optimized variant in the 4.7 family, GLM 4.7 FlashX makes the largest capability tradeoff compared to the full GLM-4.7. It retains the generation's core improvements in coding, reasoning, and conversational tone, but peak performance on the most complex tasks will be lower. The tradeoff is intentional: for the majority of production requests that don't require maximum reasoning depth, GLM 4.7 FlashX delivers adequate quality at the lowest possible latency.

The model shares the same API surface as GLM-4.7 and GLM-4.7-Flash, enabling seamless tier switching. Teams can route simple requests to GLM 4.7 FlashX and complex ones to GLM-4.7, optimizing both cost and quality across their request distribution.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

GLM 4.7 FlashX

About GLM 4.7 FlashX