AI

ServiceNow MosaicLeaks study identifies ‘mosaic effect’ privacy risk for deep research agents

ServiceNow MosaicLeaks study identifies ‘mosaic effect’ privacy risk for deep research agents

Photo: Donny Gonzo — CC0, via Wikimedia Commons

ServiceNow published the MosaicLeaks study on Hugging Face, identifying a critical privacy vulnerability it calls the “mosaic effect” for deep research agents that combine private internal context with public web queries. The research team defines this effect as the ability for adversaries to reconstruct sensitive organizational information from agent outbound query logs alone. 1

The study categorizes mosaic effect leakage into three escalating risk tiers. The lowest tier, intent leakage, occurs when an observer can deduce the specific topic an agent was investigating from query logs. Answer leakage, the middle tier, means logs contain enough contextual clues to answer a known private question without accessing the organization’s internal documents.

The highest tier, full-information leakage, lets an observer state verifiably true private claims about a target organization using only query log data, no specific prompting required. 1

To measure these risks, the ServiceNow team built the MosaicLeaks benchmark of 1,001 multi-hop research chains. The dataset splits into 559 training chains, 98 validation chains, and 344 held-out test chains drawn from unseen companies not represented in training or validation splits. Private context for chains is pulled from DRBench-style enterprise task datasets, while public context comes from the BrowseComp-Plus corpus. 1

Chains are constructed to force agents to carry private context across public and private query boundaries, mirroring real-world enterprise research workflows.

For example, a chain may seed a private question about an organization’s internal customer acquisition cost, use the answer to retrieve a public industry benchmark report, then generate a follow-up public question about how the organization’s cost compares to sector averages.

This creates explicit dependencies between local private and public web queries, as agents must carry private context forward to complete public research tasks. 1

Each chain is validated to ensure answers are retrievable, source order is correct, and prior answers are necessary rather than decorative. Every chain requires agents to retrieve a private fact to use as a bridge entity for the next public web query, replicating how internal context informs external lookups in production enterprise research tools.

Agents are tested via a simplified harness adapted from the DRBench framework, with four distinct tools: Plan to generate search queries, Choose to select retrieved documents, Read to answer individual hops, and Resolve to decide next steps. Each individual hop is evaluated via normalized string matching to isolate specific leakage points. 1

Baseline agent performance on the MosaicLeaks test set hit a 48.7% strict chain success rate, meaning 48.7% of test chains had every hop answered correctly. The research team found that standard reinforcement learning (RL) training optimized solely for task performance increases leakage risk, as agents learn to include more private context in outbound web queries to boost task accuracy. 1

To address this misalignment, the ServiceNow team developed Privacy-Aware Deep Research (PA-DR), a training method that aligns privacy goals with task performance objectives. PA-DR raises strict chain success to 58.7% while reducing combined answer and full-information leakage risk by 71% compared to baseline RL training optimized only for task performance. This 10 percentage point increase in task success alongside a 71% leakage reduction demonstrates that privacy-aligned training does not require sacrificing agent utility for security. 1

This leakage risk is not theoretical for teams already deploying similar deep research agents in production. GitHub’s internal Qubot analytics agent, built for employee use, lets staff query private product and user telemetry data in plain language via Slack, VS Code, or Copilot CLI.

For example, an employee can request recent user sign-up rate data, which Qubot pulls via Kusto for fast exploratory analysis of recent event data, then automatically switches to Trino to run complex joins with historical customer support ticket data for long-term trend analysis.

Qubot connects to Kusto and Trino query engines via Model Context Protocol (MCP) servers to support these workflows. 2

Bottom line: Organizations deploying deep research agents that access private internal data should adopt explicit privacy-aligned training methods like PA-DR to reduce leakage risk by 71% while improving strict chain task success by 10 percentage points, rather than relying on standard task-only reinforcement learning that increases privacy exposure.

We may earn commission from affiliate links at no extra cost to you. Last updated: Jun 22, 2026.
Aira

Founding Editor and Publisher of ZBrandCo, covering artificial intelligence, open-source software, and the developer tools people actually use. Signal over hype: every story starts from a primary source and explains why it matters. ZBrandCo runs no paid reviews and no affiliate links. Tips and corrections: editorial@zbrandco.com.