hide
Free keywords:
-
Abstract:
Computational models play an increasingly vital role in scientific research by enabling the numerical simulation of complex processes. Such models are also fundamental in geosciences. For instance, they offer critical insights into the impacts of global change on the Earth system today and in the future. Beyond their value as research tools, models are also software products and should therefore adhere to certain established software engineering standards. However, scientists are rarely trained as software developers, which can lead to potential deficiencies in software quality like unreadable, inefficient, or erroneous code. The complexity of models, coupled with their integration into broader workflows, also often makes it challenging to reproduce results, evaluate processes, and build upon them.
In this paper, we review the state and current practices of the development processes of the state-of-the-art land surface models used by the Global Carbon Budget. We combine the experience of modelers from the respective research groups with the expertise of software engineers from tech companies to outline key principles and tools for improving software quality in research. We explore four main areas: (1) model testing and validation, (2) scientific, technical, and user documentation, (3) version control, continuous integration, and code review, and (4) the portability and reproducibility of workflows.
Our review reveals that while modeling communities are incorporating many best practices, significant room for improvement remains in areas such as automated testing, automated documentation, and reproducibility. Therefore, we here identify and promote essential software engineering practices, including numerous examples of practices from within the community that can serve as guidelines for other models and could help streamline processes across the entire community.
We conclude with an open-source example implementation of these principles, demonstrating portable and reproducible data flows, a continuous integration setup, and web-based visualizations. This example may serve as a practical resource for model developers, users, and all scientists engaged in scientific programming.