Crypto & Web3

Chainalysis Outlines 10 Questions: Blockchain Data Quality

Chainalysis Outlines 10 Questions: Blockchain Data Quality

Image: Chainalysis

Chainalysis has outlined a public framework of 10 due diligence questions designed to help compliance teams, regulators, and cryptocurrency investigators vet third-party blockchain analytics data quality. The framework targets a growing operational risk for financial institutions and law enforcement agencies that rely on external blockchain analytics for investigations and regulatory reporting.

Flawed data from these providers can derail active law enforcement cases and trigger wrongful customer terminations for crypto exchanges, per the published guide. The full 10-question due diligence framework is available via Chainalysis’s official blog. Chainalysis’s public 10-question due diligence framework

Chainalysis Outlines 10 Due Diligence Questions for Blockchain Analytics Providers

The 10 questions are organized into three core categories focused on the most common failure points for blockchain analytics data quality: address grouping accuracy, entity labeling rigor, and testing and legal admissibility. Each category is designed to surface gaps in a provider’s methodology that could lead to false attribution of cryptocurrency activity to innocent parties.

Three Core Categories of Data Quality Risk

Address Grouping Verification

The first category of questions focuses on address grouping accuracy, a foundational layer of blockchain analytics data quality. Providers must disclose whether they use deterministic, rule-based and repeatable methods, or probabilistic, statistical and inferred methods to link addresses to the same controlling entity.

They must also explain their mitigation processes for edge cases like CoinJoin transactions, which deliberately break standard Bitcoin UTXO co-spending heuristics used to group addresses. For example, a provider relying solely on co-spending heuristics would incorrectly group all participants in a single CoinJoin transaction as one entity, leading to false attribution of illicit funds to innocent users.

Entity Labeling Rigor

The second category of questions mandates that providers cite the exact evidentiary standard behind every label. This requires a clear, documented distinction between labels backed by law enforcement-seized wallet data, court records, or corroborated public sources, and those based on uncorroborated anonymous tips or unverified social media claims.

The guide stresses that address groupings must remain valid even if a label is removed, to avoid the common attribution error of conflating user-controlled deposit addresses with a custodial service’s internal wallets when label data is flawed. The framework also asks providers to clarify how they differentiate between wallet users and wallet controllers, a critical distinction for nested entities that rely on third-party custodial infrastructure.

The third category of questions asks whether a provider’s attribution methods have been validated against independent, seized wallet infrastructure to measure real-world accuracy. It also asks providers to confirm whether their methodology has survived Daubert standard scrutiny in U.S. federal court, the legal benchmark for admissible scientific evidence in criminal and civil proceedings involving cryptocurrency evidence.

Daubert scrutiny requires courts to evaluate the testability, peer review, and error rates of scientific methods, a bar many blockchain analytics providers have not yet faced in public proceedings, per the Chainalysis guide.

Accompanying Formal Ontology Codifies Industry Data Standards

The 10 questions were released alongside Chainalysis’s first formal ontology for blockchain analytics data quality, published the same day as the due diligence guide. The ontology codifies the evidentiary standards Chainalysis uses in its own internal operational work, creating a public, verifiable benchmark for the broader blockchain analytics industry to reduce costly attribution errors. The full formal data quality ontology is published on Chainalysis’s official blog. Chainalysis’s formal blockchain analytics data quality ontology

Standardized Definitions and Core Concept Clarity

The ontology codifies the strict separation of deterministic on-chain analysis, which is rooted in verifiable on-chain evidence, from probabilistic intelligence tradecraft. It also defines standardized, industry-wide definitions for core concepts like address clustering and entity labeling to reduce confusion around provider data quality claims.

Machine Learning Output Transparency Rules

The ontology draws a hard line on machine learning use, requiring providers to clearly label all probabilistic ML outputs as separate from evidence-based deterministic conclusions to avoid misleading users into treating inferred data as proven fact. Chainalysis notes that the ontology formalizes the evidentiary standards it has used internally for years.

For example, a provider using ML to flag addresses as high-risk must clearly separate that flag from a deterministic conclusion that the address is controlled by a sanctioned entity, which requires on-chain evidence and corroborating public records.

We may earn commission from affiliate links at no extra cost to you. Last updated: Jul 2, 2026.
Aira

Founding Editor and Publisher of ZBrandCo, covering artificial intelligence, open-source software, and the developer tools people actually use. Signal over hype: every story starts from a primary source and explains why it matters. ZBrandCo runs no paid reviews and no affiliate links. Tips and corrections: editorial@zbrandco.com.