An AI reasoning workflow built on OpenAI’s o3 Deep Research model helped clinicians identify 18 previously unsolved rare genetic diagnoses in children, a new study published in NEJM AI reports OpenAI. The effort yielded a 4.8% additional diagnostic rate when reanalyzing 376 cases that had already failed specialist review OpenAI.
Per the study, 50% of patients with rare diseases remain undiagnosed even after completing comprehensive genomic sequencing and specialist evaluation. Fragmented clinical records, millions of possible genetic variants, and rapidly evolving scientific literature create barriers to identifying causal links between genetic markers and patient symptoms OpenAI. As new gene-disease relationships and variant classifications emerge over time, previously inconclusive test results can become newly interpretable, but most health systems lack scalable processes to revisit old cases OpenAI.
Using AI to diagnose rare childhood genetic diseases: workflow design
A joint team of researchers and clinicians from the Manton Center for Orphan Disease Research at Boston Children’s Hospital, Harvard University, and OpenAI designed the workflow to act as an explanation-focused reasoning layer that sits atop standard clinical genomic analysis pipelines, rather than operating as a standalone diagnostic tool OpenAI. Instead of returning only a ranked list of candidate genes, the model was prompted to connect patient clinical features, inheritance patterns, variant evidence, and relevant scientific literature into a justification that human reviewers could interrogate OpenAI.
For each of the 376 de-identified cases, the team assembled a standardized input packet. Each packet included Human Phenotype Ontology terms to describe the patient’s clinical presentation, redacted clinician notes, age and gender metadata, and a filtered variant table listing each variant’s rarity, predicted protein impact, ClinVar classification, and signal strength across family members OpenAI. Most packets included genomic data from the affected child and both biological parents OpenAI.
Validation on known cases and confidence scoring
Prior to testing the workflow on unsolved cases, the research team validated its performance across three cohorts of confirmed rare disease cases: 51 cases with pre-existing confirmed rare disease diagnoses, 57 cases of neuromuscular disorders, and 15 cases with long-read genome sequencing data OpenAI. In duplicate test runs on the 51-case confirmed rare disease cohort, the model correctly identified the causal gene and variant for 48 cases. It also returned the correct diagnosis for 45 of the 57 neuromuscular cases, and identified the correct gene in all 15 long-read genome cases, including both disease-causing alleles in 12 of those 15 cases OpenAI.
The model’s self-reported confidence scores aligned closely with its real-world accuracy in these validation tests. The mean minimum confidence score for consistently correct diagnostic calls was 85.6, compared to 42.1 for calls that were incorrect or unconfirmed OpenAI. The research team stressed these scores are not calibrated probabilities, and are not intended to replace clinical evidence or expert review, but they do help human reviewers prioritize the most promising candidate diagnoses for further investigation OpenAI’s health AI safety initiative.
Confirmed diagnoses from previously unsolved cases
The team applied the validated workflow to four distinct groups of long-unsolved cases: pediatric patients with neurodevelopmental disorders, individuals with rare neuromuscular conditions, children and teenagers with early-stage psychosis, and pediatric cases of sudden unexplained death OpenAI. The 18 newly confirmed diagnoses were distributed across all four of these cohorts OpenAI.
All 376 included cases had already been analyzed by multiple commercial and institutional genomic pipelines, and reviewed by multidisciplinary specialist teams, with many having remained undiagnosed for years prior to the study OpenAI. At no point was any model output treated as a formal clinical diagnosis OpenAI.
Every candidate explanation generated by the workflow required review by at least two study team members, classification of the candidate variant as pathogenic or likely pathogenic using the standard ACMG/AMP framework used by clinical diagnostic laboratories, and confirmation from a CLIA-certified lab before a result could be returned to a patient’s family OpenAI. Final sign-off from the patient’s treating clinical team was also required, and any reviewer disagreements were resolved via consensus OpenAI.
Following this full clinical validation process, the team confirmed 18 new diagnoses, representing the 4.8% additional yield from the reanalysis effort. The findings were published June 18, 2026, in NEJM AI OpenAI.
Implications for rare disease care and AI deployment
The study highlights a widespread, underaddressed operational gap in rare disease care: genomic reanalysis carries both scientific and administrative burden, as new gene-disease links, variant reclassifications, and published case reports accumulate constantly, while most clinical labs lack the staffing to revisit old inconclusive test results OpenAI. The AI-assisted workflow offers a potential path to scaling periodic reanalysis without replacing specialist judgment, as the model only generates evidence-linked hypotheses for human experts to validate Google’s June 2026 Alabama AI infrastructure investment.
The research team emphasized the workflow is a research tool, not a deployed clinical product, and all findings still require full clinical validation before being returned to families OpenAI. The 4.8% additional diagnostic yield from already-reviewed cases underscores how much untapped diagnostic value remains locked in old genomic data as scientific knowledge advances.
This gap is particularly impactful for the 50% of rare disease patients who currently remain undiagnosed after initial genomic testing, a population that scalable, guardrailed AI tools could help serve OpenAI’s health AI safety initiative.
Bottom line: The study demonstrates a viable, guardrailed AI workflow to scale reanalysis of unsolved rare pediatric genetic cases, delivering 18 new confirmed diagnoses and a 4.8% additional diagnostic yield from 376 previously reviewed cases, though all findings still require full clinical validation via ACMG/AMP classification and CLIA-certified lab confirmation before being returned to families.
