DeepSeek V4 Flash

DeepSeek V4 Flash is DeepSeek's April 23, 2026 efficiency-tier model in the V4 series. It pairs a hybrid attention architecture with a context window of 1.0M tokens and supports reasoning, tool use, and implicit caching.

ReasoningTool UseImplicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'deepseek/deepseek-v4-flash',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

About DeepSeek V4 Flash

DeepSeek V4 Flash was released April 23, 2026 as part of DeepSeek's V4 generation. The V4 series introduces a hybrid attention architecture that combines Compressed Sparse Attention (CSA) with Heavily Compressed Attention (HCA), along with ManifoldConstrained Hyper-Connections (mHC) that refine standard residual connections. The combination targets efficient long-context inference at the 1.0M tokens window.

DeepSeek V4 Flash positions as the efficiency tier of the V4 lineup. It handles instruction following, classification, short-form Q&A, and other tasks where latency and per-token cost matter more than maximum reasoning depth. Maximum output is 1.0M tokens, the same budget as DeepSeek V4 Pro, so single-call response length is not the differentiator. The split between Flash and Pro is about capability depth and cost.

DeepSeek V4 Flash supports tool use and reasoning, and the model is tagged for implicit caching. Implicit caching reduces input-token charges for repeated prefixes without requiring explicit cache-control headers in the request. Access is through AI Gateway with an AI Gateway API key or OIDC token, so you don't need a separate DeepSeek platform account.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

DeepSeek V4 Flash

About DeepSeek V4 Flash