GLM 4.7 Flash
GLM 4.7 Flash is the speed-optimized variant in Z.ai's GLM-4.7 generation, released N/A. It delivers faster inference for high-throughput workloads while retaining the coding, tool usage, and conversational improvements introduced in GLM-4.7.
import { streamText } from 'ai'
const result = streamText({ model: 'zai/glm-4.7-flash', prompt: 'Why is the sky blue?'})Frequently Asked Questions
How does GLM 4.7 Flash compare to the full GLM-4.7?
GLM 4.7 Flash shares the same foundational improvements (coding, tool usage, multi-step reasoning, natural tone) but is optimized for faster inference at lower cost. Peak capability on complex tasks will be lower than GLM-4.7.
What is the difference between GLM 4.7 Flash and GLM-4.7-FlashX?
GLM 4.7 Flash provides more capability with moderate speed optimization. GLM-4.7-FlashX is the fastest tier in the generation, trading more capability for the lowest possible latency.
Can I switch between GLM-4.7 variants easily?
Yes. All variants share the same API surface. Change the model identifier to switch between GLM-4.7, GLM-4.7-Flash, and GLM-4.7-FlashX.
What is the context window for GLM 4.7 Flash?
200K tokens.
How do I authenticate with GLM 4.7 Flash through AI Gateway?
AI Gateway provides a unified API key. No separate Z.ai account is needed. Use the model identifier to route requests. BYOK is supported for direct provider accounts.
Is GLM 4.7 Flash suitable for frontend development?
Yes. It inherits the frontend development improvements from GLM-4.7, though the full GLM-4.7 may produce slightly better results on complex UI generation tasks.
What is the pricing for GLM 4.7 Flash?
See the pricing section on this page for today's rates. AI Gateway exposes each provider's pricing for GLM 4.7 Flash.