How an Astrophysicist Uses Codex to Simulate Black Holes

Aira Published Jun 14, 2026 · 7 min read

How an Astrophysicist Uses Codex to Simulate Black Holes

Image: OpenAI

Published June 14, 2026.

For decades, the world’s fastest supercomputers have been stuck on the same problem: simulating trillions of electrons and ions spiraling around magnetic field lines near a supermassive black hole. The particles corkscrew so fast that every simulation timestep must be impossibly small — turning months of compute time into a bottleneck on minuscule motion instead of the cosmic-scale behavior scientists actually need.

Chi-kwan Chan, a computational astrophysicist at the University of Arizona and member of the Event Horizon Telescope (EHT) Theory Working Group, is changing that with an unlikely collaborator: OpenAI Codex.

The Problem: Collisionless Plasma Is a Computational Nightmare

Black holes aren’t just gravity wells. They’re surrounded by hot, diffuse plasma where particles rarely collide — they spiral along magnetic field lines in tight corkscrew orbits. This “collisionless” regime requires tracking individual particle trajectories, not fluid approximations.

Plasma Region	Behavior	Simulation Status
Dense plasma (accretion disk)	Frequent collisions → fluid equations work	✅ Solved
Hot, diffuse corona near SMBH	No collisions, rapid gyration → must track trillions of particles	❌ Bottleneck

“For decades, this has limited how realistically we can simulate black hole plasma.” — Chi-kwan Chan, OpenAI case study

Standard methods must calculate every tiny gyration. Supercomputers like Frontier (Oak Ridge, 1.1 exaFLOPS) or Aurora (Argonne, 2 exaFLOPS) spend most cycles on sub-microsecond particle turns instead of the macroscopic accretion flows and jet formation that produce the emission EHT actually observes. The EHT Theory Working Group has identified this as the primary barrier to next-generation simulations since the 2019 M87* image capture.

The Codex Workflow: Human Proposes, AI Generates, Science Verifies

Chan doesn’t ask Codex for answers. He asks it for candidate algorithms — then his team tests them.

1. Define the mathematical problem & physical constraints (conservation laws, stability criteria)
2. Codex generates dozens of numerical schemes / integrator variants
3. Researchers INSPECT, TEST, and VALIDATE against known analytic solutions
4. Failed approaches discarded; working ones refined and documented
5. Verified algorithms integrated into the simulation pipeline (gravity + GRMHD codes)

The key difference from “black-box” AI: Every Codex output is human-readable code with mathematical steps exposed. Chan’s team can see the reasoning, write unit tests, and verify against benchmark problems like Larmor orbits and Landau damping.

“We don’t accept an idea because it came from Einstein, from a bright student, or from an AI model. We accept it only after repeated testing.” — Chi-kwan Chan, OpenAI case study

This workflow mirrors how the LLNL ASC program validates new numerical methods for inertial confinement fusion — human-defined constraints, machine-generated candidates, rigorous verification.

What This Unlocks for EHT and Beyond

If This Works…	The Payoff
Larger stable timesteps in particle-in-cell codes	Supercomputers simulate years of black hole evolution, not microseconds
Collisionless plasma modeling at scale	First video of M87* from EHT observations (2019 was a static image)
Reusable algorithm library	Other domains: fusion reactors, space weather, cosmic ray acceleration

The Event Horizon Telescope already produced the 2019 M87 image using simulations Chan helped build. The next milestone is a movie — resolving time-variable structure in the accretion flow. That needs collisionless plasma physics at scale. EHT’s 2026 observing campaign, scheduled for March–April, aims to capture the first multi-epoch data for M87 and Sgr A*.

The Skepticism Is Real — And Chan Welcomes It

LLMs hallucinate. Numerical schemes can look correct but violate energy conservation. Chan’s framework:

Principle	In Practice
Testability	Every algorithm gets benchmarked against analytic solutions
Reproducibility	Independent verification across team members
Physical Understanding	If the team can’t explain why it works, it doesn’t ship
Failure Tolerance	“Most scientific ideas fail. What matters is that these algorithms are testable.”

This is the opposite of vibe coding. It’s AI-augmented rigor. The approach parallels how DeepMind’s AlphaFold was adopted — not because the AI was trusted blindly, but because its predictions could be validated against CASP benchmarks and experimental structures.

Why Science Is the Ideal Domain for Current AI

Chan argues science is uniquely suited for current LLMs because every output faces empirical falsification:

“Science is the perfect place for LLMs to be useful right now. Every idea gets tested. The ones that survive are the ones that work.”

The same Codex that writes buggy React code produces testable numerical schemes — because physics doesn’t care about your confidence interval. This insight aligns with Anthropic’s “Constitutional AI” findings: models perform better when constrained by verifiable rules.

How to Try This Approach Yourself

Time to first experiment: ~2 hours | Cost: Free tier Codex + GitHub Actions compute

Level 1: Reproduce a Known Result (Beginner)

Fork Chan’s test cases (when published) or start with a simple gyro-kinetic test problem
Prompt Codex: “Generate a Boris pusher variant with adaptive timestepping for collisionless plasma”
Run against Larmor orbit analytic solution — does energy conserve?

Level 2: Extend to Your Domain (Intermediate)

Identify a bottleneck in your simulation (particle-in-cell, molecular dynamics, climate)
Frame as: “Here’s the math constraint, here’s the current scheme, give me 10 variants”
Build CI pipeline that auto-tests conservation laws on every Codex suggestion

Level 3: Full Pipeline Integration (Advanced)

Embed Codex in your build system: generate → test → benchmark → promote
Track which AI-generated schemes survive 6 months of production use
Share verified schemes back to the community (open science loop)

The Bigger Pattern: AI for Verifiable Science

Chan’s work illustrates a shift: science is the ideal domain for current AI because every output faces empirical falsification. The same Codex that writes buggy React code produces testable numerical schemes — because physics doesn’t care about your confidence interval.

“Science is the perfect place for LLMs to be useful right now. Every idea gets tested. The ones that survive are the ones that work.”

What to watch next: OpenAI’s “Codex for Science” program (announced alongside this case study), EHT’s 2026 observing campaign, and the first peer-reviewed paper citing AI-generated numerical schemes.

OpenAI Academy Launches Workplace AI Courses — OpenAI’s new training program for applying AI at work
DeepSeek V4-Pro: MIT-Licensed, 1M Context — Open-weight model closing the gap with closed APIs
How to Run Local LLMs With Ollama (2026 Guide) — Practical guide for running models on your hardware

Quick Decision Framework

If You Are…	Start Here
Student / New to PIC codes	Level 1: Reproduce Larmor orbit test in a notebook
Researcher with existing code	Level 2: Profile your bottleneck, frame as Codex prompt
Team lead / PI	Level 3: Propose AI-augmented workflow in next grant

FAQ

Q: Does this replace human physicists?
A: No — it accelerates the algorithm discovery phase. Humans still define constraints, verify results, and interpret physical meaning.

Q: What if Codex generates a scheme that looks correct but has subtle bugs?
A: That’s why every candidate gets tested against analytic solutions (Larmor orbits, Landau damping) before integration. The verification layer catches what inspection misses.

Q: Can this work for other plasma physics problems?
A: Yes — fusion tokamaks (ITER), space weather modeling (SWPC), and cosmic ray acceleration all face similar collisionless plasma bottlenecks.

Q: When will EHT release the first black hole video?
A: Depends on 2026–2027 observing campaigns and simulation readiness. Chan’s algorithms are targeting the 2027 EHT data release cycle.

Q: Is Codex better than a human at designing numerical schemes?
A: No — it generates candidates faster. Humans still curate, test, and validate. The speedup is in exploration breadth, not insight quality.

Q: What hardware do I need to run these simulations?
A: Codex runs in the cloud. The resulting algorithms need HPC (Frontier, Aurora, or cloud HPC like AWS ParallelCluster) for production runs.

Technical Appendix: The Numerical Details

The Gyro-Kinetic Bottleneck

In collisionless plasma near a supermassive black hole, the Larmor radius (r_L = v_⊥/Ω_c) is tiny — meters to kilometers for electrons, tens of kilometers for protons. The cyclotron frequency Ω_c = qB/mc means particles complete millions of orbits per second. Standard particle-in-cell (PIC) codes must resolve each orbit with timesteps Δt ≪ 1/Ω_c.

For a 10^4 M_⊙ black hole with B ~ 10^3 G near the horizon:
– Electron cyclotron period: ~10^-11 s
– Required timestep: ~10^-12 s
– Accretion timescale: ~10^3 s (hours)
– Ratio: 10^15 steps needed — impossible even on exascale

Candidate Approaches Codex Generated

Approach	Core Idea	Status
Guiding-center reduction	Average over fast gyro-motion; track guiding center drift	✓ Verified for weak-field test cases
Adaptive Boris pusher	Dynamically adjust Δt based on local B-field strength	✓ Energy-conserving in benchmarks
Implicit moment method	Solve for moments (density, velocity) instead of particles	✗ Failed Landau damping test
Neural operator surrogate	Train neural net to predict particle push vs explicit integrate	✗ Failed reproducibility

The guiding-center reduction combined with adaptive Boris push showed the most promise — it reduces the effective cyclotron frequency by ~10^3×, making the problem tractable on current HPC.

Comparison With Other AI Coding Tools

Tool	Strength For This Use Case	Limitation
OpenAI Codex	Generates inspectable numerical schemes; integrated with GitHub	Requires subscription; cloud-only
Claude Code	Strong at mathematical reasoning; local via API	Not specialized for scientific code gen
Cursor/Copilot	Good for boilerplate; weak on novel algorithms	Limited context for large PIC codes
Local LLMs (Ollama)	Private, free; runs offline	Smaller models struggle with novel numerics

Why Codex won here: The OpenAI Science team provided direct support, and the integration with GitHub Actions made CI validation seamless. For independent researchers, local models are catching up — see our Local LLM Guide.

Sources:
1. OpenAI — How an astrophysicist uses Codex to help simulate black holes (June 11, 2026) — primary case study
2. Event Horizon Telescope Collaboration — First M87* Image (2019) — context on 2019 achievement
3. Frontier Supercomputer Specifications — compute context
4. EHT Theory Working Group Publications — simulation requirements

Bottom Line: The first black hole movie isn’t a telescope problem — it’s a plasma simulation problem. Chan’s Codex workflow is the first credible path to solving it. If you work in computational physics, the barrier to entry is ~2 hours and a GitHub Actions account. Try it.

Editorially independent: we accept no payment for coverage and currently use no affiliate links. Read our Editorial Standards and Corrections Policy. Published: Jun 14, 2026.

How an Astrophysicist Uses Codex to Simulate Black Holes

The Problem: Collisionless Plasma Is a Computational Nightmare

The Codex Workflow: Human Proposes, AI Generates, Science Verifies

What This Unlocks for EHT and Beyond

The Skepticism Is Real — And Chan Welcomes It

Why Science Is the Ideal Domain for Current AI

How to Try This Approach Yourself

Level 1: Reproduce a Known Result (Beginner)

Level 2: Extend to Your Domain (Intermediate)

Level 3: Full Pipeline Integration (Advanced)

The Bigger Pattern: AI for Verifiable Science

Quick Decision Framework

FAQ

Technical Appendix: The Numerical Details

The Gyro-Kinetic Bottleneck

Candidate Approaches Codex Generated

Comparison With Other AI Coding Tools

Read next

Use GPT-5.6 Sol, Terra, and Luna on Amazon Bedrock

Claude Shared Chats Were Showing Up in Google Search

NVIDIA, Microsoft and IBM Launch Open Secure AI Alliance

The zBrandco Edition

How an Astrophysicist Uses Codex to Simulate Black Holes

The Problem: Collisionless Plasma Is a Computational Nightmare

The Codex Workflow: Human Proposes, AI Generates, Science Verifies

What This Unlocks for EHT and Beyond

The Skepticism Is Real — And Chan Welcomes It

Why Science Is the Ideal Domain for Current AI

How to Try This Approach Yourself

Level 1: Reproduce a Known Result (Beginner)

Level 2: Extend to Your Domain (Intermediate)

Level 3: Full Pipeline Integration (Advanced)

The Bigger Pattern: AI for Verifiable Science

Related zbrandco Coverage

Quick Decision Framework

FAQ

Technical Appendix: The Numerical Details

The Gyro-Kinetic Bottleneck

Candidate Approaches Codex Generated

Comparison With Other AI Coding Tools

Read next

Use GPT-5.6 Sol, Terra, and Luna on Amazon Bedrock

Claude Shared Chats Were Showing Up in Google Search

NVIDIA, Microsoft and IBM Launch Open Secure AI Alliance

The zBrandco Edition