English
 
Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Journal Article

Choosing multiple linear regressions for weather-based crop yield prediction with ABSOLUT v1.2 applied to the districts of Germany

Authors
/persons/resource/conradt

Conradt,  Tobias
Potsdam Institute for Climate Impact Research;

External Ressource
No external resources are shared
Fulltext (public)

27363oa.pdf
(Publisher version), 6MB

Supplementary Material (public)
There is no public supplementary material available
Citation

Conradt, T. (2022): Choosing multiple linear regressions for weather-based crop yield prediction with ABSOLUT v1.2 applied to the districts of Germany. - International Journal of Biometeorology, 66, 11, 2287-2300.
https://doi.org/10.1007/s00484-022-02356-5


Cite as: https://publications.pik-potsdam.de/pubman/item/item_27363
Abstract
ABSOLUT v1.2 is an adaptive algorithm that uses correlations between time-aggregated weather variables and crop yields for yield prediction. In contrast to conventional regression-based yield prediction methods, a very broad range of possible input features and their combinations are exhaustively tested for maximum explanatory power. Weather variables such as temperature, precipitation, and sunshine duration are aggregated over different seasonal time periods preceding the harvest to 45 potential input features per original variable. In a first step, this large set of features is reduced to those aggregates very probably holding explanatory power for observed yields. The second, computationally demanding step evaluates predictions for all districts with all of their possible combinations. Step three selects those combinations of weather features that showed the highest predictive power across districts. Finally, the district-specific best performing regressions among these are used for actual prediction, and the results are spatially aggregated. To evaluate the new approach, ABSOLUT v1.2 is applied to predict the yields of silage maize, winter wheat, and other major crops in Germany based on two decades of data from about 300 districts. It turned out to be absolutely crucial to not only make out-of-sample predictions (solely based on data excluding the target year to predict) but to also consequently separate training and testing years in the process of feature selection. Otherwise, the prediction accuracy would be over-estimated by far. The question arises whether performances claimed for other statistical modelling examples are often upward-biased through input variable selection disregarding the out-of-sample principle.