When should I pick DeepSeek V4 Pro over DeepSeek V4 Flash?

Pick DeepSeek V4 Pro for complex reasoning, multi-step problem solving, and agentic tool orchestration. Use DeepSeek V4 Flash for short-form instruction following, classification, and high-volume routing where per-token cost dominates.

What is the context window and max output for DeepSeek V4 Pro?

The context window is 1.0M tokens and the maximum output is 1.0M tokens.

Does DeepSeek V4 Pro support tool calls inside reasoning steps?

Yes. DeepSeek V4 Pro is tagged for reasoning and tool use, so agent pipelines can plan, call tools, integrate results, and iterate in a single call through the AI SDK or the Chat Completions, Responses, or Messages API formats.

How does implicit caching affect pricing for DeepSeek V4 Pro?

Repeated input prefixes (typically long system prompts) are detected automatically and charged at the cached input rate of $0.003625 per token instead of $0.435. No cache-control headers are required.

Do I need a DeepSeek platform account to use DeepSeek V4 Pro?

No. Access DeepSeek V4 Pro through AI Gateway with an AI Gateway API key or OIDC token.

Does DeepSeek V4 Pro support zero data retention?

Yes, Zero Data Retention is available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.

Dashboard

DeepSeek V4 Pro

DeepSeek V4 Pro is DeepSeek's April 23, 2026 top-tier model in the V4 series. It pairs a hybrid attention architecture with a context window of 1.0M tokens and targets complex reasoning, multi-step problem solving, and agentic tasks.

ReasoningTool UseImplicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'deepseek/deepseek-v4-pro',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out DeepSeek V4 Pro by DeepSeek. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

1.9s

58tps

$1.74/M$0.43/M

$3.48/M$0.87/M

Read:$0.0/M

Write:—

—

04/23/2026

Legal:Terms

•

Privacy

5.3s

63tps

$1.74/M

$3.48/M

Read:$0.14/M

Write:—

—

04/23/2026

Legal:Terms

•

Privacy

1.1s

59tps

$1.74/M

$3.48/M

Read:$0.14/M

Write:—

—

04/23/2026

Legal:Terms

•

Privacy

66K

0.5s

50tps

$1.74/M

$3.48/M

Read:$0.14/M

Write:—

—

04/23/2026

Legal:Terms

•

Privacy

1.9s

120tps

$1.74/M

$3.48/M

Read:$0.15/M

Write:—

—

04/23/2026

Legal:Terms

•

Privacy

0.7s

27tps

$2.10/M

$4.40/M

Read:$0.2/M

Write:—

—

04/23/2026

More models by DeepSeek

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

0.5s

130tps

$0.14/M

$0.28/M

Read:$0.0/M

Write:—

—

04/23/2026

164K

0.5s

98tps

$0.28/M

$0.42/M

Read:$0.03/M

Write:—

—

12/01/2025

164K

0.2s

126tps

$0.28/M

$0.42/M

Read:$0.03/M

Write:—

—

12/01/2025

131K

1.7s

26tps

$0.27/M

$1.00/M

Read:$0.14/M

Write:—

—

09/22/2025

164K

0.8s

38tps

$0.21/M

$0.79/M

Read:$0.13/M

Write:—

—

08/21/2025

164K

1.1s

32tps

$0.27/M

$1.12/M

Read:$0.14/M

Write:—

—

12/26/2024

About DeepSeek V4 Pro

DeepSeek V4 Pro was released April 23, 2026 as the high-capability tier of DeepSeek's V4 generation. The V4 series introduces a hybrid attention architecture that combines Compressed Sparse Attention (CSA) with Heavily Compressed Attention (HCA), and uses ManifoldConstrained Hyper-Connections (mHC) in place of standard residual connections. The combination supports efficient inference at the 1.0M tokens window.

DeepSeek V4 Pro is positioned for complex reasoning, multi-step problem solving, and agentic workflows. Tool use, reasoning, and implicit caching are all supported, so DeepSeek V4 Pro fits planner-style pipelines where the model decides on tool calls, integrates results, and iterates toward an answer. Maximum output is 1.0M tokens, which gives long-form reasoning chains and tool-call sequences room to complete in a single response.

Access is through AI Gateway with an AI Gateway API key or OIDC token. You can integrate through the AI SDK, Chat Completions, Responses, or Messages API formats. Implicit caching applies when a long input prefix repeats across calls, charging the cached input rate instead of the standard input rate for cached tokens.

What To Consider When Choosing a Provider

Configuration: DeepSeek V4 Pro is priced higher than the V4 Flash variant. If your workload is short-form instruction following or classification, DeepSeek V4 Flash delivers similar output limits at lower per-token rates.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use DeepSeek V4 Pro

Best For

Complex reasoning workloads: Analytical research, technical synthesis, and structured derivation benefit from the 1.0M tokens window and V4 architecture
Multi-step agent pipelines: Planning, tool calls, and result integration combine in a single endpoint
Software engineering automation: Reasoning and reliable tool use across long context support code-generation pipelines
Multi-format API integrations: AI SDK, Chat Completions, Responses, and Messages API formats route to the same high-capability model

Consider Alternatives When

Short-form throughput: Use DeepSeek V4 Flash for classification, routing, and instruction following at lower per-token cost
MIT-licensed reasoning: DeepSeek-R1 remains the open-weights reasoning specialist when license terms drive the selection
Earlier-generation cost: DeepSeek V3 family models may meet capability needs at lower cost when V4 context and architecture aren't required

Conclusion

DeepSeek V4 Pro is the capability tier of the V4 generation, suited to complex reasoning and agentic workloads at the 1.0M tokens window. For short-form, high-volume tasks within the same generation, DeepSeek V4 Flash is the cost-efficient alternative.

Frequently Asked Questions

When should I pick DeepSeek V4 Pro over DeepSeek V4 Flash?
Pick DeepSeek V4 Pro for complex reasoning, multi-step problem solving, and agentic tool orchestration. Use DeepSeek V4 Flash for short-form instruction following, classification, and high-volume routing where per-token cost dominates.
What is the context window and max output for DeepSeek V4 Pro?
The context window is 1.0M tokens and the maximum output is 1.0M tokens.
What is the V4 hybrid attention architecture?
DeepSeek V4 Pro combines Compressed Sparse Attention (CSA) with Heavily Compressed Attention (HCA), and uses ManifoldConstrained Hyper-Connections (mHC) in place of standard residual connections. The combination targets efficient inference at long context.
Does DeepSeek V4 Pro support tool calls inside reasoning steps?
Yes. DeepSeek V4 Pro is tagged for reasoning and tool use, so agent pipelines can plan, call tools, integrate results, and iterate in a single call through the AI SDK or the Chat Completions, Responses, or Messages API formats.
How does implicit caching affect pricing for DeepSeek V4 Pro?
Repeated input prefixes (typically long system prompts) are detected automatically and charged at the cached input rate of $0.003625 per token instead of $0.435. No cache-control headers are required.
Do I need a DeepSeek platform account to use DeepSeek V4 Pro?
No. Access DeepSeek V4 Pro through AI Gateway with an AI Gateway API key or OIDC token.
Does DeepSeek V4 Pro support zero data retention?
Yes, Zero Data Retention is available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

DeepSeek V4 Pro

Playground

Providers

More models by DeepSeek

About DeepSeek V4 Pro

What To Consider When Choosing a Provider

When to Use DeepSeek V4 Pro

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions