Claude 3.7 Sonnet launched on January 25, 2024 with a core design principle: reasoning should not be a separate model but an integrated mode of the same system. Rather than routing to a distinct "thinking" model for hard problems, Claude 3.7 Sonnet lets you toggle between standard response mode and extended thinking within one API endpoint, with API-level control over the thinking token budget (up to 128K tokens per Anthropic at launch).
Prompting techniques transfer between modes. Patterns that work for standard responses generally work in extended thinking mode too, reducing the overhead of maintaining two prompt strategies. Anthropic designed the reasoning improvements to focus on real-world task performance rather than math and competition benchmark optimization, a deliberate shift from other reasoning models.
Coding was the standout improvement area. Claude 3.7 Sonnet scored 63.7% on SWE-bench Verified (70.3% with enhanced compute). Cursor called it strong for real-world coding tasks with significant gains in complex codebases and advanced tool use. Cognition found it far better than other models they tested at planning code changes and handling full-stack updates. Anthropic highlighted its precision for complex agent workflows. Replit deployed it for building sophisticated web apps from scratch where other models stalled. Canva's evaluations showed consistently production-ready code output with superior design taste.
Front-end web development was specifically called out as a strength area. The model also shipped with a 45% reduction in unnecessary refusals compared to its predecessor and the introduction of Claude Code, a command-line agentic coding tool in limited research preview.