Question 1

What SWE-bench and Terminal-bench scores did Claude Opus 4 achieve?

Accepted Answer

Opus 4 scored 72.5% on SWE-bench Verified and 43.2% on Terminal-bench, both without extended thinking.

Question 2

How long can Claude Opus 4 run an agentic task without losing coherence?

Accepted Answer

Rakuten validated a seven-hour independent run on a demanding open-source refactoring task with sustained performance. Anthropic described the model as capable of working continuously for several hours.

Question 3

What is extended thinking with tool use in Claude Opus 4?

Accepted Answer

A beta capability introduced with the Claude 4 launch. The model alternates between extended reasoning and tool calls within a single session. For example, it can think about a problem, run a web search, reason about the results, search again, and synthesize across the chain.

Question 4

How did Claude Opus 4 improve memory capabilities?

Accepted Answer

When you provide local file access, Opus 4 creates and maintains memory files to store key facts and context. This enables better long-term coherence on extended tasks. Anthropic illustrated this with the model creating a navigation guide during autonomous Pokémon gameplay.

Question 5

What was the shortcut-taking behavior reduction?

Accepted Answer

Claude 4 models (Opus 4 and Sonnet 4) are 65% less likely to use shortcuts or loopholes to complete agentic tasks compared to Sonnet 3.7. This is a reliability improvement for production deployments where you need the model to solve the actual problem rather than gaming the metric.

Question 6

How does Opus 4 pricing compare to Sonnet 4?

Accepted Answer

Check the pricing panel on this page for today's numbers. AI Gateway tracks rates across every provider that serves Claude Opus 4.

Question 7

Does Claude Opus 4 support thinking summaries?

Accepted Answer

Yes. A smaller model condenses lengthy thought processes into summaries. Anthropic noted this is only needed about 5% of the time, when thoughts are too long to display in full.

Agent Stack

Core Platform

Tools

Learn

Build

Explore

Claude Opus 4

Frequently Asked Questions