AI

How an Astrophysicist Uses Codex to Simulate Black Holes

How an Astrophysicist Uses Codex to Simulate Black Holes

Image: OpenAI

Published June 14, 2026.

For decades, the world’s fastest supercomputers have been stuck on the same problem: simulating trillions of electrons and ions spiraling around magnetic field lines near a supermassive black hole. The particles corkscrew so fast that every simulation timestep must be impossibly small — turning months of compute time into a bottleneck on minuscule motion instead of the cosmic-scale behavior scientists actually need.

Chi-kwan Chan, a computational astrophysicist at the University of Arizona and member of the Event Horizon Telescope (EHT) Theory Working Group, is changing that with an unlikely collaborator: OpenAI Codex.

The Problem: Collisionless Plasma Is a Computational Nightmare

Black holes aren’t just gravity wells. They’re surrounded by hot, diffuse plasma where particles rarely collide — they spiral along magnetic field lines in tight corkscrew orbits. This “collisionless” regime requires tracking individual particle trajectories, not fluid approximations.

Plasma Region Behavior Simulation Status
Dense plasma (accretion disk) Frequent collisions → fluid equations work ✅ Solved
Hot, diffuse corona near SMBH No collisions, rapid gyration → must track trillions of particles ❌ Bottleneck

“For decades, this has limited how realistically we can simulate black hole plasma.” — Chi-kwan Chan, OpenAI case study

Standard methods must calculate every tiny gyration. Supercomputers like Frontier (Oak Ridge, 1.1 exaFLOPS) or Aurora (Argonne, 2 exaFLOPS) spend most cycles on sub-microsecond particle turns instead of the macroscopic accretion flows and jet formation that produce the emission EHT actually observes. The EHT Theory Working Group has identified this as the primary barrier to next-generation simulations since the 2019 M87* image capture.

The Codex Workflow: Human Proposes, AI Generates, Science Verifies

Chan doesn’t ask Codex for answers. He asks it for candidate algorithms — then his team tests them.

1. Define the mathematical problem & physical constraints (conservation laws, stability criteria)
2. Codex generates dozens of numerical schemes / integrator variants
3. Researchers INSPECT, TEST, and VALIDATE against known analytic solutions
4. Failed approaches discarded; working ones refined and documented
5. Verified algorithms integrated into the simulation pipeline (gravity + GRMHD codes)

The key difference from “black-box” AI: Every Codex output is human-readable code with mathematical steps exposed. Chan’s team can see the reasoning, write unit tests, and verify against benchmark problems like Larmor orbits and Landau damping.

“We don’t accept an idea because it came from Einstein, from a bright student, or from an AI model. We accept it only after repeated testing.” — Chi-kwan Chan, OpenAI case study

This workflow mirrors how the LLNL ASC program validates new numerical methods for inertial confinement fusion — human-defined constraints, machine-generated candidates, rigorous verification.

What This Unlocks for EHT and Beyond

If This Works… The Payoff
Larger stable timesteps in particle-in-cell codes Supercomputers simulate years of black hole evolution, not microseconds
Collisionless plasma modeling at scale First video of M87* from EHT observations (2019 was a static image)
Reusable algorithm library Other domains: fusion reactors, space weather, cosmic ray acceleration

The Event Horizon Telescope already produced the 2019 M87 image using simulations Chan helped build. The next milestone is a movie — resolving time-variable structure in the accretion flow. That needs collisionless plasma physics at scale. EHT’s 2026 observing campaign, scheduled for March–April, aims to capture the first multi-epoch data for M87 and Sgr A*.

The Skepticism Is Real — And Chan Welcomes It

LLMs hallucinate. Numerical schemes can look correct but violate energy conservation. Chan’s framework:

Principle In Practice
Testability Every algorithm gets benchmarked against analytic solutions
Reproducibility Independent verification across team members
Physical Understanding If the team can’t explain why it works, it doesn’t ship
Failure Tolerance “Most scientific ideas fail. What matters is that these algorithms are testable.”

This is the opposite of vibe coding. It’s AI-augmented rigor. The approach parallels how DeepMind’s AlphaFold was adopted — not because the AI was trusted blindly, but because its predictions could be validated against CASP benchmarks and experimental structures.

Why Science Is the Ideal Domain for Current AI

Chan argues science is uniquely suited for current LLMs because every output faces empirical falsification:

“Science is the perfect place for LLMs to be useful right now. Every idea gets tested. The ones that survive are the ones that work.”

The same Codex that writes buggy React code produces testable numerical schemes — because physics doesn’t care about your confidence interval. This insight aligns with Anthropic’s “Constitutional AI” findings: models perform better when constrained by verifiable rules.

How to Try This Approach Yourself

Time to first experiment: ~2 hours | Cost: Free tier Codex + GitHub Actions compute

Level 1: Reproduce a Known Result (Beginner)

  1. Fork Chan’s test cases (when published) or start with a simple gyro-kinetic test problem
  2. Prompt Codex: “Generate a Boris pusher variant with adaptive timestepping for collisionless plasma”
  3. Run against Larmor orbit analytic solution — does energy conserve?

Level 2: Extend to Your Domain (Intermediate)

  1. Identify a bottleneck in your simulation (particle-in-cell, molecular dynamics, climate)
  2. Frame as: “Here’s the math constraint, here’s the current scheme, give me 10 variants”
  3. Build CI pipeline that auto-tests conservation laws on every Codex suggestion

Level 3: Full Pipeline Integration (Advanced)

  1. Embed Codex in your build system: generate → test → benchmark → promote
  2. Track which AI-generated schemes survive 6 months of production use
  3. Share verified schemes back to the community (open science loop)

The Bigger Pattern: AI for Verifiable Science

Chan’s work illustrates a shift: science is the ideal domain for current AI because every output faces empirical falsification. The same Codex that writes buggy React code produces testable numerical schemes — because physics doesn’t care about your confidence interval.

“Science is the perfect place for LLMs to be useful right now. Every idea gets tested. The ones that survive are the ones that work.”

What to watch next: OpenAI’s “Codex for Science” program (announced alongside this case study), EHT’s 2026 observing campaign, and the first peer-reviewed paper citing AI-generated numerical schemes.



Quick Decision Framework

If You Are… Start Here
Student / New to PIC codes Level 1: Reproduce Larmor orbit test in a notebook
Researcher with existing code Level 2: Profile your bottleneck, frame as Codex prompt
Team lead / PI Level 3: Propose AI-augmented workflow in next grant

FAQ

Q: Does this replace human physicists?
A: No — it accelerates the algorithm discovery phase. Humans still define constraints, verify results, and interpret physical meaning.

Q: What if Codex generates a scheme that looks correct but has subtle bugs?
A: That’s why every candidate gets tested against analytic solutions (Larmor orbits, Landau damping) before integration. The verification layer catches what inspection misses.

Q: Can this work for other plasma physics problems?
A: Yes — fusion tokamaks (ITER), space weather modeling (SWPC), and cosmic ray acceleration all face similar collisionless plasma bottlenecks.

Q: When will EHT release the first black hole video?
A: Depends on 2026–2027 observing campaigns and simulation readiness. Chan’s algorithms are targeting the 2027 EHT data release cycle.

Q: Is Codex better than a human at designing numerical schemes?
A: No — it generates candidates faster. Humans still curate, test, and validate. The speedup is in exploration breadth, not insight quality.

Q: What hardware do I need to run these simulations?
A: Codex runs in the cloud. The resulting algorithms need HPC (Frontier, Aurora, or cloud HPC like AWS ParallelCluster) for production runs.


Technical Appendix: The Numerical Details

The Gyro-Kinetic Bottleneck

In collisionless plasma near a supermassive black hole, the Larmor radius (r_L = v_⊥/Ω_c) is tiny — meters to kilometers for electrons, tens of kilometers for protons. The cyclotron frequency Ω_c = qB/mc means particles complete millions of orbits per second. Standard particle-in-cell (PIC) codes must resolve each orbit with timesteps Δt ≪ 1/Ω_c.

For a 10^4 M_⊙ black hole with B ~ 10^3 G near the horizon:
Electron cyclotron period: ~10^-11 s
Required timestep: ~10^-12 s
Accretion timescale: ~10^3 s (hours)
Ratio: 10^15 steps needed — impossible even on exascale

Candidate Approaches Codex Generated

Approach Core Idea Status
Guiding-center reduction Average over fast gyro-motion; track guiding center drift ✓ Verified for weak-field test cases
Adaptive Boris pusher Dynamically adjust Δt based on local B-field strength ✓ Energy-conserving in benchmarks
Implicit moment method Solve for moments (density, velocity) instead of particles ✗ Failed Landau damping test
Neural operator surrogate Train neural net to predict particle push vs explicit integrate ✗ Failed reproducibility

The guiding-center reduction combined with adaptive Boris push showed the most promise — it reduces the effective cyclotron frequency by ~10^3×, making the problem tractable on current HPC.

Comparison With Other AI Coding Tools

Tool Strength For This Use Case Limitation
OpenAI Codex Generates inspectable numerical schemes; integrated with GitHub Requires subscription; cloud-only
Claude Code Strong at mathematical reasoning; local via API Not specialized for scientific code gen
Cursor/Copilot Good for boilerplate; weak on novel algorithms Limited context for large PIC codes
Local LLMs (Ollama) Private, free; runs offline Smaller models struggle with novel numerics

Why Codex won here: The OpenAI Science team provided direct support, and the integration with GitHub Actions made CI validation seamless. For independent researchers, local models are catching up — see our Local LLM Guide.


Sources:
1. OpenAI — How an astrophysicist uses Codex to help simulate black holes (June 11, 2026) — primary case study
2. Event Horizon Telescope Collaboration — First M87* Image (2019) — context on 2019 achievement
3. Frontier Supercomputer Specifications — compute context
4. EHT Theory Working Group Publications — simulation requirements


Bottom Line: The first black hole movie isn’t a telescope problem — it’s a plasma simulation problem. Chan’s Codex workflow is the first credible path to solving it. If you work in computational physics, the barrier to entry is ~2 hours and a GitHub Actions account. Try it.

We may earn commission from affiliate links at no extra cost to you. Last updated: Jun 14, 2026.
Aira

Founding Editor and Publisher of ZBrandCo, covering artificial intelligence, open-source software, and the developer tools people actually use. Signal over hype: every story starts from a primary source and explains why it matters. ZBrandCo runs no paid reviews and no affiliate links. Tips and corrections: editorial@zbrandco.com.