What is computer use and how does it work in Claude 3.5 Sonnet?

Computer use is an API capability that lets Claude interact with computers as people do, perceiving screen state via screenshots, moving a cursor, clicking buttons, and typing. Developers integrate the API and pass instructions like "fill out this form using data from my spreadsheet," which Claude translates into individual computer commands.

How much did SWE-bench Verified improve between the June and October 2024 Claude 3.5 Sonnet versions?

The October upgrade moved the score from 33.4% to 49.0%, which Anthropic stated was higher than all publicly available models at that time, including other high-performing reasoning models and specialized agentic coding systems.

Did the October 2024 upgrade change Claude 3.5 Sonnet's pricing?

No. Anthropic released the upgraded model at the same price as its predecessor. Input, output, and context window specs remained consistent.

What companies were using computer use?

Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company were building with computer use at launch. Replit used it to evaluate apps during construction. The Browser Company applied it to web-based workflow automation.

How does this version differ from the June 2024 Claude 3.5 Sonnet (claude-3.5-sonnet-20240620)?

The October upgrade added computer use capabilities and significantly improved coding and tool use benchmarks. The June version lacked computer use entirely and had lower SWE-bench scores. Both versions share the same model family name but are distinct checkpoints.

What tool use improvements came with this upgrade?

TAU-bench tool use scores improved from 62.6% to 69.2% in the retail domain and from 36.0% to 46.0% in the airline domain, reflecting gains in handling structured multi-step agentic interactions.

Claude 3.5 Sonnet

View Status

The upgraded Claude 3.5 Sonnet (October 2024) is the first publicly available model to offer computer use in public beta, with SWE-bench Verified scores jumping from 33.4% to 49.0%, plus across-the-board coding and tool use improvements at the same price as its predecessor.

File InputTool UseVision (Image)Explicit Caching

import { streamText } from 'ai'

const result = streamText({
  model: 'anthropic/claude-3.5-sonnet',
  prompt: 'Why is the sky blue?'
})

Playground

Try out Claude 3.5 Sonnet by Anthropic. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

About Claude 3.5 Sonnet

The updated Claude 3.5 Sonnet launched on December 11, 2023 with two distinct advances. First, coding and tool use improved substantially: SWE-bench Verified jumped from 33.4% to 49.0%, surpassing all publicly available models at the time including specialized agentic coding systems. TAU-bench tool use scores rose from 62.6% to 69.2% in retail and from 36.0% to 46.0% in the more challenging airline domain. GitLab reported up to 10% stronger reasoning with no added latency. Cognition observed substantial improvements in coding, planning, and problem-solving.

Second, computer use arrived. Anthropic's term describes the model's ability to interact with computers as a person would: perceiving a screen through screenshots, moving a cursor, clicking interface elements, and typing. Claude 3.5 Sonnet was the first model to offer this in public beta. On OSWorld, which evaluates AI models on real-world computer tasks, the model scored 14.9% in the screenshot-only category (next-best AI system: 7.8%) and 22.0% when given more steps.

Anthropic stated explicitly that computer use was experimental: capable but at times cumbersome and error-prone. Companies including Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company were already building with it. Replit used it to evaluate apps as they're constructed, automating UI navigation across dozens of steps. Both the computer use API and the upgraded model were available from day one on the Anthropic API, Amazon Bedrock, and Google Cloud's Vertex AI, at the same pricing as the previous version. Anthropic emphasized that the capability improvements came without a cost increase.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

200K

$3.00/M

$15.00/M

Read:$0.3/M

Write:$3.75/M

—

12/11/2023

Legal:Terms

•

Privacy

200K

$3.00/M

$15.00/M

Read:$0.3/M

Write:$3.75/M

—

12/11/2023

More models by Anthropic

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

0.6s

98tps

$5.00/M

$25.00/M

Read:$0.5/M

Write:

$6.25/M

$10/K

+ input costs

—

04/16/2026

0.5s

55tps

$3.00/M

$15.00/M

Read:$0.3/M

Write:

$3.75/M

$10/K

+ input costs

—

02/17/2026

0.7s

61tps

$5.00/M

$25.00/M

Read:$0.5/M

Write:

$6.25/M

$10/K

+ input costs

—

02/05/2026

200K

0.4s

127tps

$1.00/M

$5.00/M

Read:$0.1/M

Write:

$1.25/M

$10.00/K

+ input costs

—

10/15/2025

0.6s

55tps

$3.00/M

$15.00/M

Read:

$0.3/M

Write:

$3.75/M

$10.00/K

+ input costs

—

09/29/2025

200K

0.5s

51tps

$5.00/M

$25.00/M

Read:$0.5/M

Write:

$6.25/M

$10.00/K

+ input costs

—

11/24/2024

What To Consider When Choosing a Provider

Configuration: Computer use tasks involving dozens or hundreds of steps generate substantial token volumes. AI Gateway's cost tracking lets you measure actual token consumption per session rather than estimating upfront.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Claude 3.5 Sonnet

Best For

Computer use and UI automation: The first model to offer this in public beta, suited for tasks requiring navigation of real software interfaces
Complex software engineering tasks at scale: The SWE-bench 49.0% result placed it above all publicly available models in that class
Multi-step agentic coding: Tool calls, test execution, and full-stack workflows
DevSecOps and code review pipelines: GitLab reported stronger reasoning on multi-step software development processes
Web-based workflow automation: The model needs to navigate sites, fill forms, and extract data across sessions

Consider Alternatives When

Highest-volume throughput: Claude 3.5 Haiku is faster and cheaper for tasks that don't require computer use or deep coding
Deterministic automation: Anthropic explicitly described computer use as experimental and sometimes cumbersome
Text-only generation: Tasks without agentic complexity don't benefit from computer use capabilities
Extended thinking mode: Hard reasoning problems arrived with Claude 3.7 Sonnet

Conclusion

Claude 3.5 Sonnet (October 2024) marked an inflection point: it posted strong real-world software engineering benchmark results at release and was the first to make computer interaction available to developers via an API. Teams building agentic pipelines that involve actual software interfaces, not just code generation, have a concrete reason to evaluate this version.

Frequently Asked Questions

What is computer use and how does it work in Claude 3.5 Sonnet?
Computer use is an API capability that lets Claude interact with computers as people do, perceiving screen state via screenshots, moving a cursor, clicking buttons, and typing. Developers integrate the API and pass instructions like "fill out this form using data from my spreadsheet," which Claude translates into individual computer commands.
Was computer use production-ready?
No. Anthropic explicitly described it as experimental at the October 2024 launch: capable but at times cumbersome and error-prone. They released it early to gather developer feedback, expecting rapid improvement.
How much did SWE-bench Verified improve between the June and October 2024 Claude 3.5 Sonnet versions?
The October upgrade moved the score from 33.4% to 49.0%, which Anthropic stated was higher than all publicly available models at that time, including other high-performing reasoning models and specialized agentic coding systems.
Did the October 2024 upgrade change Claude 3.5 Sonnet's pricing?
No. Anthropic released the upgraded model at the same price as its predecessor. Input, output, and context window specs remained consistent.
What companies were using computer use?
Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company were building with computer use at launch. Replit used it to evaluate apps during construction. The Browser Company applied it to web-based workflow automation.
How does this version differ from the June 2024 Claude 3.5 Sonnet (claude-3.5-sonnet-20240620)?
The October upgrade added computer use capabilities and significantly improved coding and tool use benchmarks. The June version lacked computer use entirely and had lower SWE-bench scores. Both versions share the same model family name but are distinct checkpoints.
What tool use improvements came with this upgrade?
TAU-bench tool use scores improved from 62.6% to 69.2% in the retail domain and from 36.0% to 46.0% in the airline domain, reflecting gains in handling structured multi-step agentic interactions.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Claude 3.5 Sonnet

Playground

About Claude 3.5 Sonnet

Providers

More models by Anthropic

What To Consider When Choosing a Provider

When to Use Claude 3.5 Sonnet

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions

Playground

About Claude 3.5 Sonnet

Providers

More models by Anthropic