Claude Opus 4

Claude Opus 4 is a coding model from Anthropic with strong benchmark scores, including 72.5% on SWE-bench Verified and 43.2% on Terminal-bench, with sustained performance on multi-hour agentic tasks and hybrid extended thinking with tool use.

File InputReasoningTool UseVision (Image)Explicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'anthropic/claude-opus-4',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

About Claude Opus 4

Claude Opus 4 launched on August 5, 2025 alongside Claude Sonnet 4. Anthropic positioned it for demanding coding workloads. The benchmark results: 72.5% on SWE-bench Verified and 43.2% on Terminal-bench. These scores were achieved without extended thinking, showing that Opus 4's baseline capability advanced meaningfully beyond previous models.

Sustained performance differentiated Opus 4 most distinctly from its predecessors. Rakuten validated the model with a demanding open-source refactor that ran independently for seven hours with sustained performance, maintaining focus and coherence over hundreds of individual steps. Cursor called it strong for coding and a leap forward in complex codebase understanding. Block reported it was the first model to boost code quality during editing and debugging in their agent (codename goose) while maintaining full reliability. Cognition noted Opus 4 handled critical actions that previous models had missed on complex challenges.

The Claude 4 launch introduced extended thinking with tool use in beta. Both Opus 4 and Sonnet 4 can alternate between reasoning and tool use like web search during a single extended thinking session. This enables research patterns where Claude searches, reasons about results, searches again based on that reasoning, and synthesizes across the full chain. Memory capabilities also improved substantially: when given local file access, Opus 4 creates and maintains memory files to store key information, enabling better long-term coherence on extended tasks.

The Claude 4 generation reduced shortcut-taking behavior by 65% compared to Sonnet 3.7 on agentic tasks particularly susceptible to that failure mode. This is an important reliability property for production agent deployments where gaming a metric rather than solving the underlying problem is a real risk.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Claude Opus 4

About Claude Opus 4