GitHub has updated Copilot for VS Code with backend harness improvements and expanded Auto model routing designed to cut redundant token use across long agentic coding sessions.
The changes target multi-turn workflows that span planning, editing, debugging, and repeated tool calls across extended session times, a growing pain point for enterprise and individual development teams as agentic coding tasks grow in complexity.
For context, a 2-hour multi-file refactoring session with 25 total tool calls previously sent the same full repository context and system prompt on every single turn, even when only a small subset of that data was relevant to the current step. GitHub Blog
Copilot’s new harness cuts redundant context per turn
For each turn in a long VS Code session, the Copilot harness prepares recurring information including system instructions, full repository context, conversation history, available tool definitions, and current task state. Much of this data is duplicated across consecutive turns, with full tool schemas for unused tools adding fixed per-turn overhead for teams with access to dozens of integrated tools. These tools include MCP tools, terminal commands, file operations, and workspace search tools, per GitHub’s documentation. GitHub Blog
The updated harness addresses this redundancy with two key changes. First, prompt caching reuses computed model state for repeated prompt prefixes instead of recalculating them on every request. The system uses cache-control breakpoints aligned to Copilot’s context compaction cycles to preserve cached state across consecutive turns, eliminating redundant processing for static content that does not change between requests. GitHub Blog
Second, tool search loads only the tool definitions relevant to the current task on demand, rather than sending every full tool schema into context every turn. For teams running long-running agentic workflows with access to dozens of integrated tools, this eliminates fixed per-turn overhead for all unused tool definitions per turn.
The changes are built to work across extended sessions via these cache-control breakpoints and provider-specific tool search implementations. A full technical deep dive covering these implementation details is available for engineering teams building custom agentic workflows on top of Copilot. GitHub Blog
Auto model routing uses HyDRA to match tasks to models
The second major update expands Copilot’s existing Auto model selection feature, which automatically picks the underlying model for each request without requiring developers to manually tune model settings. Auto now uses the internal HyDRA routing model, which classifies incoming tasks by four core metrics: reasoning depth, code complexity, debugging difficulty, and tool orchestration needs. GitHub Blog
For example, a request to write new authentication middleware would be classified as high reasoning depth, high code complexity, and high tool orchestration needs. A request to rename a variable across a single file would be classified as low across all four metrics. GitHub’s internal evaluations found no single underlying model consistently outperformed all others across every task type.
Stronger, more expensive models only delivered measurable quality gains for tasks requiring deep reasoning, such as multi-step architectural design or complex root-cause bug diagnosis. For routine edits, code explanations, and focused small changes, more efficient models matched output quality at lower token cost, per GitHub’s testing. GitHub Blog
This finding departs from the common industry assumption that larger models always deliver better outputs for every coding task. Instead, GitHub frames model selection as a resource allocation problem rather than a strict quality tradeoff, where the goal is to match task requirements to the lowest-cost model that meets the required quality bar. GitHub Blog
Cache-aware routing avoids mid-session model switches
To avoid undermining the efficiency gains from prompt caching, Auto now only routes to new models at two natural cache boundaries. The first is the opening turn of a conversation, when no cached state exists to preserve. The second is after context compaction, when Copilot summarizes older conversation turns and resets the prompt prefix to stay within context window limits. GitHub Blog
Between these boundaries, the selected model remains active to let the cache build across consecutive turns. Switching models mid-conversation would invalidate existing cached state, costing more tokens than the routing change would save, per GitHub’s engineering team. GitHub Blog
The routing system also accounts for real-time model health, tracking five core metrics: availability, utilization, speed, error rates, and cost. This avoids routing to models that are technically capable for a task but currently overloaded or degraded, ensuring consistent response quality even during high global demand periods. GitHub Blog
For global users, the HyDRA model was trained on conversations across 16 language families, including CJK (Chinese, Japanese, Korean) and European language groups. Routing accuracy stays within four percentage points of the English baseline across all groups, with no statistically significant quality gap detected in GitHub’s internal testing. GitHub Blog
Improvements roll out first to VS Code, with Auto expanding to other surfaces
The harness improvements detailed today are shipping now for all Copilot for VS Code users on the latest stable release, with no opt-in required. GitHub has published a deeper technical deep dive covering prompt caching implementation, cache-control breakpoint configuration, and provider-specific tool search implementation for engineering teams building custom agentic workflows. GitHub Blog
The expanded Auto model routing feature is currently rolling out across additional Copilot surfaces beyond VS Code. GitHub’s stated goal is to make task-aware model selection the default for all Copilot interactions across its product lineup, eliminating the need for manual model selection for all users. GitHub Blog
