Skip to content

Claude Opus 4

Claude Opus 4 is a coding model from Anthropic with strong benchmark scores, including 72.5% on SWE-bench Verified and 43.2% on Terminal-bench, with sustained performance on multi-hour agentic tasks and hybrid extended thinking with tool use.

File InputReasoningTool UseVision (Image)Explicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'anthropic/claude-opus-4',
prompt: 'Why is the sky blue?'
})

Playground

Try out Claude Opus 4 by Anthropic. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Anthropic
Legal:Terms
Privacy
200K
1.6s
44tps
$15.00/M$75.00/M
Read:$1.5/M
Write:
$18.75/M
$10.00/K
+ input costs
08/05/2025
Amazon Bedrock
Legal:Terms
Privacy
200K
3.1s
18tps
$15.00/M$75.00/M
Read:$1.5/M
Write:
$18.75/M
08/05/2025
Google Vertex AI
Legal:Terms
Privacy
200K
1.3s
44tps
$15.00/M$75.00/M
Read:$1.5/M
Write:
$18.75/M
$10.00/K
+ input costs
08/05/2025
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Anthropic

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
0.8s
102tps
$5.00/M$25.00/M
Read:$0.5/M
Write:
$6.25/M
$10/K
+ input costs
anthropic logo
bedrock logo
vertexAnthropic logo
04/16/2026
1M
0.7s
54tps
$3.00/M$15.00/M
Read:$0.3/M
Write:
$3.75/M
$10/K
+ input costs
anthropic logo
bedrock logo
vertexAnthropic logo
02/17/2026
1M
0.7s
48tps
$5.00/M$25.00/M
Read:$0.5/M
Write:
$6.25/M
$10/K
+ input costs
anthropic logo
bedrock logo
vertexAnthropic logo
02/05/2026
200K
0.6s
111tps
$1.00/M$5.00/M
Read:$0.1/M
Write:
$1.25/M
$10.00/K
+ input costs
anthropic logo
bedrock logo
vertexAnthropic logo
10/15/2025
1M
0.7s
58tps
$3.00/M
$15.00/M
Read:
$0.3/M
Write:
$3.75/M
$10.00/K
+ input costs
anthropic logo
bedrock logo
vertexAnthropic logo
09/29/2025
200K
0.6s
52tps
$5.00/M$25.00/M
Read:$0.5/M
Write:
$6.25/M
$10.00/K
+ input costs
anthropic logo
bedrock logo
vertexAnthropic logo
11/24/2024

About Claude Opus 4

Claude Opus 4 launched on August 5, 2025 alongside Claude Sonnet 4. Anthropic positioned it for demanding coding workloads. The benchmark results: 72.5% on SWE-bench Verified and 43.2% on Terminal-bench. These scores were achieved without extended thinking, showing that Opus 4's baseline capability advanced meaningfully beyond previous models.

Sustained performance differentiated Opus 4 most distinctly from its predecessors. Rakuten validated the model with a demanding open-source refactor that ran independently for seven hours with sustained performance, maintaining focus and coherence over hundreds of individual steps. Cursor called it strong for coding and a leap forward in complex codebase understanding. Block reported it was the first model to boost code quality during editing and debugging in their agent (codename goose) while maintaining full reliability. Cognition noted Opus 4 handled critical actions that previous models had missed on complex challenges.

The Claude 4 launch introduced extended thinking with tool use in beta. Both Opus 4 and Sonnet 4 can alternate between reasoning and tool use like web search during a single extended thinking session. This enables research patterns where Claude searches, reasons about results, searches again based on that reasoning, and synthesizes across the full chain. Memory capabilities also improved substantially: when given local file access, Opus 4 creates and maintains memory files to store key information, enabling better long-term coherence on extended tasks.

The Claude 4 generation reduced shortcut-taking behavior by 65% compared to Sonnet 3.7 on agentic tasks particularly susceptible to that failure mode. This is an important reliability property for production agent deployments where gaming a metric rather than solving the underlying problem is a real risk.

What To Consider When Choosing a Provider

  • Configuration: Opus 4's higher per-token cost and long-running session profile make AI Gateway's cost tracking particularly useful. Observability from the first request helps prevent budget surprises on multi-hour jobs.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Claude Opus 4

Best For

  • Long-horizon agentic tasks: Requiring sustained focus across thousands of steps and multiple hours, validated with a seven-hour independent refactor run
  • Complex codebase understanding and modification: SWE-bench 72.5% and Terminal-bench 43.2%
  • Research and analysis workflows: Benefiting from extended thinking with tool use, reasoning interleaved with web search or other external tools
  • Scientific discovery and R&D tasks: Analytical depth and domain knowledge are the binding constraints
  • Production agent deployments: The 65% reduction in shortcut-taking behavior matters for reliability

Consider Alternatives When

  • Per-token cost constraint: Sonnet 4 delivers strong performance at significantly lower cost and matched or exceeded Opus 4 on SWE-bench
  • Critical response latency: Sonnet variants are faster for interactive use
  • Shorter bounded tasks: The capability differential over Sonnet shrinks when multi-hour sustained attention isn't needed
  • 1M context window: Came to Sonnet 4 later and to Opus models with 4.6

Conclusion

Claude Opus 4 demonstrated sustained agentic performance at the Claude 4 generation's launch. It solves hard problems and maintains coherence and performance over hours. Teams building long-horizon coding agents, long-horizon research pipelines, or autonomous engineering workflows have concrete reference points in the benchmark data and early customer validation.

Frequently Asked Questions

  • What SWE-bench and Terminal-bench scores did Claude Opus 4 achieve?

    Opus 4 scored 72.5% on SWE-bench Verified and 43.2% on Terminal-bench, both without extended thinking.

  • How long can Claude Opus 4 run an agentic task without losing coherence?

    Rakuten validated a seven-hour independent run on a demanding open-source refactoring task with sustained performance. Anthropic described the model as capable of working continuously for several hours.

  • What is extended thinking with tool use in Claude Opus 4?

    A beta capability introduced with the Claude 4 launch. The model alternates between extended reasoning and tool calls within a single session. For example, it can think about a problem, run a web search, reason about the results, search again, and synthesize across the chain.

  • How did Claude Opus 4 improve memory capabilities?

    When you provide local file access, Opus 4 creates and maintains memory files to store key facts and context. This enables better long-term coherence on extended tasks. Anthropic illustrated this with the model creating a navigation guide during autonomous Pokémon gameplay.

  • What was the shortcut-taking behavior reduction?

    Claude 4 models (Opus 4 and Sonnet 4) are 65% less likely to use shortcuts or loopholes to complete agentic tasks compared to Sonnet 3.7. This is a reliability improvement for production deployments where you need the model to solve the actual problem rather than gaming the metric.

  • How does Opus 4 pricing compare to Sonnet 4?

    Check the pricing panel on this page for today's numbers. AI Gateway tracks rates across every provider that serves Claude Opus 4.

  • Does Claude Opus 4 support thinking summaries?

    Yes. A smaller model condenses lengthy thought processes into summaries. Anthropic noted this is only needed about 5% of the time, when thoughts are too long to display in full.