APBench and benchmarking large language model performance in fundamental astrodynamics problems for space engineering

Wu, Di; Zhang, Raymond; Zucchelli, Enrico M.; Chen, Yongchao; Linares, Richard

doi:10.1038/s41598-025-91150-5

All Reviews

AstronomyNiche

advanced

APBench and benchmarking large language model performance in fundamental astrodynamics problems for space engineering

Wu, Di et al. (2025)

Published: Mar 7, 2025
Journal: Scientific Reports · Vol. 15 · No. 1
DOI: 10.1038/s41598-025-91150-5

View Original View PDF

At a Glance

How good can LLMs solve space science university-level problems

Summary

Authors created a dataset of questions from Astrodynamics, tested a variety of LLMs including open-source ones on them, and evaluated their performance. The paper is a good example of a benchmark study and how to conduct it. Helpful for anyone doing benchmark stuff in astronomy.

Method Snapshot

LLM (benchmark)

Background

Deep knowledge of benchmarking in LLMs + the state of art

In Collections

Astronomy2 papers

LLMs in dynamical astronomy

How large language models are transforming astronomical research in celestial mechanics and dynamical astronomy.

Evgeny Smirnov

A nice example of the usefulness of LLMs in astronomy + a good example of how to do a benchmark study in astronomy + LLM

— ES