Qwen3 Max Preview
Qwen3 Max Preview is Alibaba's early-access release of its trillion-parameter Qwen3-Max model, providing developers with ahead-of-schedule access to Qwen3-Max capabilities for evaluation and prototyping.
import { streamText } from 'ai'
const result = streamText({ model: 'alibaba/qwen3-max-preview', prompt: 'Why is the sky blue?'})Playground
Try out Qwen3 Max Preview by Alibaba. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
About Qwen3 Max Preview
Qwen3 Max Preview reflects the state of Alibaba's trillion-parameter Qwen3-Max model ahead of its general availability cutover. Teams that need to assess model behavior before committing to Qwen3-Max in production, evaluating output formatting, instruction-following fidelity, or multilingual quality, can do so through the preview identifier without waiting for the GA release window.
The preview shares the same mixture-of-experts (MoE) architecture as the production release, with more than one trillion total parameters and a context window extending to 262.1K tokens. Its capability profile spans mathematics, coding, structured output, and multilingual dialogue.
The practical value of the preview designation is continuity: developers who build and test against Qwen3 Max Preview can observe the model's behavior in realistic conditions, validate prompt templates, and instrument cost estimates before the production model reaches its stable pricing tier. Because AI Gateway abstracts the underlying provider endpoint, migrating from the preview to the GA identifier requires only a model string change in your application configuration.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by Alibaba
| Model |
|---|
What To Consider When Choosing a Provider
- Configuration: Preview models may have evolving rate limits or capability changes; confirm the stability guarantees of your chosen provider's preview access before building critical production paths.
- Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Qwen3 Max Preview
Best For
- Pre-GA evaluation: Assessing trillion-parameter model behavior before committing production workloads to Qwen3-Max GA
- Prototyping against a near-final model: Iterating on prompt structures, output schemas, and retrieval-augmented workflows
- Early-access benchmarking: Internal A/B testing that requires access to a frontier-scale model ahead of GA
- Schema validation ahead of rollout: Developer teams validating JSON formatting and tool-calling schemas prior to production
Consider Alternatives When
- GA stability required: Migrate to Qwen3-Max once it reaches GA when your use case demands stability guarantees
- Reasoning-intensive workloads: Consider Qwen3-Max-Thinking when visible chain-of-thought is needed
- Latency-sensitive traffic: Capacity constraints on preview access can cause unacceptable latency variance
- Budget predictability: Preview pricing periods can create uncertainty for teams that need stable per-token costs
Conclusion
Qwen3 Max Preview offers a structured way to integrate a large-scale language model from Alibaba into your stack before it reaches general availability. Because provider routing and authentication are handled through AI Gateway, transitioning to the GA model is a single-line configuration change, making the preview period genuinely useful for integration work rather than just experimentation.
Frequently Asked Questions
What is the relationship between Qwen3 Max Preview and Qwen3-Max?
Qwen3 Max Preview provides early access to the same underlying trillion-parameter model. The preview designation signals ahead-of-GA access; capability and architecture are the same as the production release.
Are there rate limits specific to the preview version?
Preview models may be subject to capacity-based rate limits that differ from the GA release.
How large is the context window on Qwen3 Max Preview?
262.1K tokens, matching the Qwen3-Max production release.
Will my prompts built for the preview work with the GA model?
In most cases yes, since the models share the same architecture and training. Thorough regression testing before switching identifiers is recommended, as minor behavioral changes can occur between preview and GA.
Does the preview support context caching?
Context caching availability depends on the serving provider; confirm support at your chosen provider before designing a caching strategy around repeated long prompts.
What coding and math benchmarks has the underlying model been evaluated on?
The underlying Qwen3-Max model scored 69.6 on SWE-bench Verified and 79.3% on LiveBench, with competitive results on AIME mathematical reasoning tasks.
Is it possible to access the model weights for Qwen3 Max Preview?
No. Qwen3-Max is a closed-weight model available only via API, both in preview and GA form.