APBench and benchmarking large language model performance in fundamental astrodynamics problems for space engineering(pdf)
Di Wu, Raymond Zhang, Enrico M. Zucchelli et al. · 2025 · Scientific Reports
At a Glance
How good can LLMs solve space science university-level problems
Summary
Authors created a dataset of questions from Astrodynamics, tested a variety of LLMs including open-source ones on them, and evaluated their performance. The paper is a good example of a benchmark study and how to conduct it. Helpful for anyone doing benchmark stuff in astronomy.
A nice example of the usefulness of LLMs in astronomy + a good example of how to do a benchmark study in astronomy + LLM
— ES
- Method:
- LLM (benchmark)
- Background:
- Deep knowledge of benchmarking in LLMs + the state of art