Question 1

What is computer use and how does it work in Claude 3.5 Sonnet?

Accepted Answer

Computer use is an API capability that lets Claude interact with computers as people do, perceiving screen state via screenshots, moving a cursor, clicking buttons, and typing. Developers integrate the API and pass instructions like "fill out this form using data from my spreadsheet," which Claude translates into individual computer commands.

Question 2

Was computer use production-ready?

Accepted Answer

No. Anthropic explicitly described it as experimental at the October 2024 launch: capable but at times cumbersome and error-prone. They released it early to gather developer feedback, expecting rapid improvement.

Question 3

How much did SWE-bench Verified improve between the June and October 2024 Claude 3.5 Sonnet versions?

Accepted Answer

The October upgrade moved the score from 33.4% to 49.0%, which Anthropic stated was higher than all publicly available models at that time, including other high-performing reasoning models and specialized agentic coding systems.

Question 4

Did the October 2024 upgrade change Claude 3.5 Sonnet's pricing?

Accepted Answer

No. Anthropic released the upgraded model at the same price as its predecessor. Input, output, and context window specs remained consistent.

Question 5

What companies were using computer use?

Accepted Answer

Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company were building with computer use at launch. Replit used it to evaluate apps during construction. The Browser Company applied it to web-based workflow automation.

Question 6

How does this version differ from the June 2024 Claude 3.5 Sonnet (claude-3.5-sonnet-20240620)?

Accepted Answer

The October upgrade added computer use capabilities and significantly improved coding and tool use benchmarks. The June version lacked computer use entirely and had lower SWE-bench scores. Both versions share the same model family name but are distinct checkpoints.

Question 7

What tool use improvements came with this upgrade?

Accepted Answer

TAU-bench tool use scores improved from 62.6% to 69.2% in the retail domain and from 36.0% to 46.0% in the airline domain, reflecting gains in handling structured multi-step agentic interactions.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Claude 3.5 Sonnet

Frequently Asked Questions