I am a final year PhD candidate in the Department of Astrophysical and Planetary Sciences at the University of Colorado, Boulder. I obtained my M.S. here in 2023 and my B.S. from the University of Michigan in 2020. I have worked in multiple areas of astrophysics, publishing five first-author papers and two public GitHub codes (see links below).

During my PhD, I have built on my prior experience in observational data reduction and analysis to become skilled in computational astrophysics, by working closely with cosmological simulations, statistical methods, and machine learning (ML). I joined the 21 cm cosmology research group of Prof. Jack Burns and Dr. David Rapetti in 2022 during the summer after my first year of graduate school at CU Boulder. I quickly appreciated how this signal from neutral hydrogen has vast potential to reveal the physics of the first stars and galaxies and the Dark Ages before them. I discovered the need to improve the ultimate step of data analysis pipelines, being accurately recovering the early Universe physics that describes a measurement of the signal. I have become an expert on developing and implementing open-source, explainable ML software to analyze the sky-averaged (global) 21 cm cosmological signal and perform Bayesian analyses. The ML tools I’ve made help the field toward realizing this goal by improving the accuracy and speed of constraining nonlinear physical models and their many parameters that control star formation and ionizing radiation in galaxies, which are still largely unknown. Please see below for mroe detailed descriptions of my research.

I’ve created emulators that are easy to use, adapt, and understand so that anyone can merge them with their own pipelines and simulations to conduct unbiased Bayesian inference. I work closely with cosmological radiative transfer simulations to train and optimize these emulators to quickly mimic physical models. I aim to deeply understand how algorithms are trained and how biases arise to avoid black-box implementation and to maintain connection to the underlying physics. For the next stage of my career, I want to continue developing useful and well-justified ML tools to study astrophysical observations and expand my understanding of the potential and limitations of AI in science.

I began research with Prof. Sally Oey my sophomore year of undergrad, where I analyzed Gaia proper motion data, rotational velocities, and masses of massive stars in the SMC to place the first constraints on how so many stars are ejected from their birth clusters. By comparing these data with models, we found that the vast majority of isolated OB stars are runaway stars that were dynamically ejected following binary interactions. After graduating, I worked with Prof. Sean Johnson to analyze Hubble Space Telescope COS UV spectra of about 200 quasars and to clarify the nature of infamous metal lines (O VII) measured toward the bright blazar 1ES 1553+113. This work helped confirm the uncertain redshift of this blazar and provided a robust statistical technique to accurately constrain blazar redshifts using just the edge of the Lyman-alpha forest.

I also love to rock climb, trail run, and play chess and pool.

Recent Work

Fast, Accurate, and Transparent Emulation of the Global 21 cm Cosmology Signal

The above images (Dorigo Jones et al. 2025) show schematics of the Kolmogorov-Arnold Network (KAN) compared to a traditional fully-connected neural network (top), and the activation functions learned on each edge in the first hidden layer of the 21cmKAN emulator when trained on simulated global 21 cm signals (bottom). The KAN is a novel type of neural network that can approximate PDE solvers by capturing the underlying functional composition. KANs differ from traditional NNs by directly learning the activation function shapes, making them more expressive. The emulator I built and released, 21cmKAN (github.com/jdorigojones/21cmKAN), trains in just 10 minutes (75x faster than 21cmLSTM) when utilizing a typical A100 GPU, produces unbiased posteriors like 21cmLSTM, and provides an intuitive understanding of its predictions via its transparent, flexible activation functions. Together, 21cmLSTM and 21cmKAN offer unprecedented accuracy and eliminate training as a bottleneck in comprehensive inference pipelines, and my open-source GitHub codes for these emulators are readily adaptable and integrable for members of the community.

A Memory-based Emulator of the Global 21 cm Signal with Unprecedented Accuracy

The above image (Dorigo Jones et al. 2024) shows the Bayesian posterior signal constraints (shown in red) when using the 21cmLSTM emulator to fit mock data realizations of the global 21 cm signal with different levels of added observational noise. These results, along with those presented in the paper, demonstrate that 21cmLSTM can be used to not only quickly and accurately emulate the signal for different models, but also to obtain unbiased constraints when fitting even very optimistic measurements of the signal. The emulator is publicly available on my GitHub page (github.com/jdorigojones/21cmLSTM) so that it can be easily employed and retrained for different models or data sets. 21cmLSTM adds to the growing body of astrophysics and cosmology research finding that LSTM recurrent neural networks can perform as well as or better than fully connected neural networks in prediction or classification of data that contains intrinsic correlation over time.

Validating posteriors obtained by an emulator for the global 21-cm signal

The above image (Dorigo Jones et al. 2023) shows the posterior distributions obtained via Bayesian nested sampling when jointly-fitting mock data of the global 21-cm signal and high-z galaxy UV luminosity function. We use either a neural-network-based emulator as the model in the likelihood (shown in red), or the full model on which the emulator was trained (shown in black). For a standard noise level of 25 mK in the 21-cm signal being fit, ARES finds the correct parameter values to within 1σ, while the emulator is biased for two parameters by 3σ. These results, along with the posteriors for other noise levels, give a comprehensive view of the astrophysical constraints that can be expected for 21-cm cosmology experiments when including UVLF data.

Origins of Massive Runaway Stars

The above image (Oey, Dorigo Jones et al. 2018) shows the transverse velocity vectors for 315 isolated OB stars in the Small Magellanic Cloud (SMC). Larger velocity vectors are faster runaway stars, the vast majority of which were dynamically ejected from their birth clusters due to close gravitational interactions (Dorigo Jones et al. 2020). Interestingly, the stars in the lower left region of the SMC show a systemic velocity toward the LMC, which is indirect evidence of a direct collision between the Magellanic Clouds predicted to have occurred ~150 Myr ago.

Improving Blazar Redshift Constraints

The above image (Dorigo Jones et al. 2022) depicts the intrinsic scatter in the gap between AGN systemic redshifts (zsys) and the highest-redshift H I Lyα absorption line detected toward them (max(zLyα)), for 192 AGN at < 0.5. We use the 95% confidence interval (highlighted) as a robust redshift estimator when combined with an object's UV-detected max(zLyα). We improve the redshift estimates of two blazars and confirm the galaxy group association of 1ES 1553+113. Among other potential applications, this redshift constraint technique may improve searches for the highly-ionized IGM performed toward bright blazars with X-ray telescopes.

Press

Contact

  • Email: johnny.dorigojones(at)colorado(dot)edu