AI

Anthropic Launches Claude Opus 4.8 and Sonnet 4.6 — Stronger Coding, Agents, Reasoning

Anthropic Launches Claude Opus 4.8 and Sonnet 4.6 — Stronger Coding, Agents, Reasoning

Image: Anthropic

Anthropic Launches Claude Opus 4.8 and Sonnet 4.6 — Stronger Coding, Agents, Reasoning

TL;DR

  • Anthropic released Claude Opus 4.8 (May 28, 2026) and Sonnet 4.6 (Feb 17, 2026) — two model upgrades delivering stronger coding, agentic reasoning, and larger context windows.
  • Opus 4.8 adds effort control (Low→High→Extra→Max), dynamic workflows with hundreds of parallel subagents, and a 3× cheaper fast mode; leads on Terminal-Bench 2.1, OSWorld, Super-Agent, and legal benchmarks.
  • Sonnet 4.6 gets a 1M-token context window (beta), matches Opus 4.6 on SWE-bench Verified (80.2%) and leads on computer use (94%), available at $3/$15 per M tokens — now default for Free and Pro plans.

What Happened

Anthropic announced two model releases in rapid succession via its official blog. On May 28, 2026, Claude Opus 4.8 launched as a significant upgrade to the flagship model — improving coding, agentic reasoning, and knowledge work while maintaining Opus 4.7 pricing. On February 17, 2026, Claude Sonnet 4.6 launched as the most capable Sonnet yet with a 1M-token context window (beta) and performance preferred over Opus 4.5 at a 59% rate in early access.

Both announcements came through Anthropic’s official blog with detailed benchmark data, enterprise testimonials, and new platform features shipping alongside the models.

Source: Claude Opus 4.8 announcement | Claude Sonnet 4.6 announcement

[IMAGE: Anthropic blog announcement split showing Opus 4.8 and Sonnet 4.6 model cards with key metrics]

Key Details

Claude Opus 4.8 (released May 28, 2026):

  • Benchmark leadership: Leading scores on Terminal-Bench 2.1, OSWorld-Verified, and Super-Agent (the only model to complete every case end-to-end). Exceeds prior Opus at every effort level on CursorBench. Achieved the highest Legal Agent Benchmark score recorded, breaking the 10% all-pass threshold for the first time. Scored 84% on Online-Mind2Web for browser-based agent tasks.
  • Honesty and reliability: Approximately 4× less likely than Opus 4.7 to allow code flaws to pass unremarked. More likely to flag uncertainties and proactively identify input/output issues in analyses.
  • Alignment: Misaligned behavior rates (deception, misuse cooperation) substantially lower than Opus 4.7 and similar to Claude Mythos Preview, Anthropic’s best-aligned model to date. Full assessment in the System Card.
  • Effort Control (new UI): Low → High (default) → Extra (xhigh in Claude Code) → Max settings. High effort matches Opus 4.7 default token spend with better performance; Extra/Max allocate more tokens for difficult tasks and long-running async workflows. Rate limits increased in Claude Code. Available on all plans.
  • Dynamic Workflows (Research Preview): In Claude Code for Enterprise, Team, and Max plans. Plans work, runs hundreds of parallel subagents in a single session, verifies outputs, reports back. Handles codebase-scale migrations across hundreds of thousands of lines from kickoff to merge.
  • Fast Mode: 3× cheaper than standard mode for latency-sensitive workloads.

Claude Sonnet 4.6 (released February 17, 2026):

  • Pricing unchanged: $3 per million input tokens / $15 per million output tokens (same as Sonnet 4.5). Default model for Free and Pro plans on claude.ai and Claude Cowork.
  • 1M-token context window (beta): Effective reasoning across the full context length.
  • Developer preference: 70% preferred over Sonnet 4.5 in Claude Code evaluations; 59% preferred over Opus 4.5 in early access.
  • Computer use milestone: Human-level capability reported on complex spreadsheets, multi-step web forms, and multi-tab browser workflows. Prompt injection resistance similar to Opus 4.6.
  • Benchmarks: 80.2% on SWE-bench Verified (averaged over 10 trials with prompt modification). 94% on Insurance Benchmark (Pace) — highest-performing model tested for computer use. +15 percentage points over Sonnet 4.5 on Box Enterprise Eval heavy reasoning Q&A. Matches Opus 4.6 on OfficeQA (charts, PDFs, tables).
  • Vending-Bench Arena: Demonstrated novel long-horizon planning strategy — heavy capacity investment for first 10 simulated months, then sharp pivot to profitability, finishing well ahead of competitors.

Enterprise Deployments:

  • 11 organizations cited for Opus 4.8 (Cursor, Replit, Vercel, Sourcegraph, Notion, etc.)
  • 15 organizations cited for Sonnet 4.6 (Vercel, Replit, Cursor, etc.)

What Changed

| Area | Before (Opus 4.7 / Sonnet 4.5) | After (Opus 4.8 / Sonnet 4.6) | Impact |

|——|——————————–|——————————–|——–|

| Coding / Reasoning | Strong but limited effort control | Effort Control UI (Low→High→Extra→Max); dynamic workflows with parallel subagents | Precise token spend control; async long-running agents |

| Context Window | 200K tokens | 1M tokens (beta) on Sonnet 4.6 | Large codebase / document reasoning in single call |

| Pricing | Opus 4.7 / Sonnet 4.5 rates | Opus 4.8 same price; Sonnet 4.6 3× cheaper fast mode | Better price/performance; latency-sensitive workloads cheaper |

| Agent Capabilities | Basic tool use | Dynamic Workflows: hundreds of parallel subagents, output verification, async reporting | Codebase-scale migrations, long-horizon tasks |

| Coding Benchmarks | 75.4% SWE-bench (Sonnet 4.5) | 80.2% SWE-bench Verified (Sonnet 4.6) | Near-Opus coding at Sonnet pricing |

| Computer Use | Basic | 94% Insurance Benchmark human-level spreadsheets, forms, tabs | Production-ready automation |

| Alignment / Safety | Opus 4.7 baseline | 4× fewer unremarked code flaws; lower misuse/deception rates | More trustworthy for production |

Why It Matters

Anthropic is executing a clear two-tier strategy: Opus for maximum capability (reasoning, agents, long-horizon tasks) and Sonnet for price/performance (coding, computer use, everyday workloads). The 1M-token context on Sonnet 4.6 is a significant escalation — it brings near-Opus context to the affordable tier, enabling whole-codebase reasoning and large-document analysis at Sonnet pricing.

The Effort Control feature on Opus 4.8 is a practical UX improvement: developers can now dial compute per task rather than paying max tokens for everything. Combined with Dynamic Workflows (parallel subagents), this enables cost-effective long-horizon agent workflows — previously a major blocker for production agent deployments.

Sonnet 4.6 becoming the default for Free and Pro plans means millions of claude.ai users get 1M context and near-Opus coding automatically — a massive distribution win.

Who It Affects

  • Developers using Claude Code / Cursor / VS Code: Sonnet 4.6 is now default — 1M context + 80% SWE-bench = better coding assistant immediately.
  • Enterprise teams on Enterprise/Team/Max plans: Dynamic Workflows (parallel subagents) unlock codebase-scale migrations and long-horizon agent tasks.
  • Cost-sensitive teams: Fast Mode (3× cheaper) + Sonnet 4.6 default = lower API bills for latency-sensitive workloads.
  • Safety/compliance teams: 4× fewer unremarked code flaws + lower misalignment rates = easier production approval.

What to Watch Next

  1. Dynamic Workflows GA timeline — Currently Research Preview for Enterprise/Team/Max; GA date will determine enterprise adoption velocity.
  2. Opus 4.8 / Sonnet 4.6 API pricing — Fast Mode 3× cheaper claims need per-token pricing clarification for budget planning.
  3. System Card full release — Alignment/misuse assessments will inform enterprise risk assessments and compliance reviews.
  4. Anthropic’s response to Google I/O / OpenAI GPT-5 — Competitive pressure may accelerate next-gen releases (Opus 5 / Sonnet 5).
  5. Independent benchmark replication — Current scores are Anthropic-reported; community verification on SWE-bench, Terminal-Bench, OSWorld pending.
  6. Model deprecation schedule — Sonnet 3.5 and Opus 4.5 retirement dates; migration path for existing production workloads.

Source

[IMAGE: Anthropic blog announcement split showing Opus 4.8 and Sonnet 4.6 model cards with key metrics]

We may earn commission from affiliate links at no extra cost to you. Last updated: Jun 15, 2026.
Aira

Founding Editor and Publisher of ZBrandCo, covering artificial intelligence, open-source software, and the developer tools people actually use. Signal over hype: every story starts from a primary source and explains why it matters. ZBrandCo runs no paid reviews and no affiliate links. Tips and corrections: editorial@zbrandco.com.