Privacy Policy Disclaimer
  Advanced SearchBrowse




Journal Article

Choosing multiple linear regressions for weather-based crop yield prediction with ABSOLUT v1.2 applied to the districts of Germany


Conradt,  Tobias
Potsdam Institute for Climate Impact Research;

External Ressource

(Supplementary material)

(Supplementary material)

Fulltext (public)

(Publisher version), 6MB

Supplementary Material (public)

(Supplementary material), 117KB


Conradt, T. (2022): Choosing multiple linear regressions for weather-based crop yield prediction with ABSOLUT v1.2 applied to the districts of Germany. - International Journal of Biometeorology, 66, 11, 2287-2300.

Cite as: https://publications.pik-potsdam.de/pubman/item/item_27363
ABSOLUT v1.2 is an adaptive algorithm that uses correlations between time-aggregated weather variables and crop yields for yield prediction. In contrast to conventional regression-based yield prediction methods, a very broad range of possible input features and their combinations are exhaustively tested for maximum explanatory power. Weather variables such as temperature, precipitation, and sunshine duration are aggregated over different seasonal time periods preceding the harvest to 45 potential input features per original variable. In a first step, this large set of features is reduced to those aggregates very probably holding explanatory power for observed yields. The second, computationally demanding step evaluates predictions for all districts with all of their possible combinations. Step three selects those combinations of weather features that showed the highest predictive power across districts. Finally, the district-specific best performing regressions among these are used for actual prediction, and the results are spatially aggregated. To evaluate the new approach, ABSOLUT v1.2 is applied to predict the yields of silage maize, winter wheat, and other major crops in Germany based on two decades of data from about 300 districts. It turned out to be absolutely crucial to not only make out-of-sample predictions (solely based on data excluding the target year to predict) but to also consequently separate training and testing years in the process of feature selection. Otherwise, the prediction accuracy would be over-estimated by far. The question arises whether performances claimed for other statistical modelling examples are often upward-biased through input variable selection disregarding the out-of-sample principle.