Assessing Machine Learning Models for Gap Filling Daily Rainfall Series in a Semiarid Region of Spain

Bellido-Jiménez, Juan Antonio; Estévez Gualda, Javier; García-Marín, A.P.

Ver/

atmosphere-12-01158-v2.pdf (2.054Mb)

Autor

Bellido-Jiménez, Juan Antonio

Estévez Gualda, Javier

García-Marín, A.P.

Editor

MDPI

Fecha

2021

Materia

Gap-filling
Rainfall series
Machine learning
Bayesian optimization

Resumen

The presence of missing data in hydrometeorological datasets is a common problem, usually due to sensor malfunction, deficiencies in records storage and transmission, or other recovery procedures issues. These missing values are the primary source of problems when analyzing and modeling their spatial and temporal variability. Thus, accurate gap-filling techniques for rainfall time series are necessary to have complete datasets, which is crucial in studying climate change evolution. In this work, several machine learning models have been assessed to gap-fill rainfall data, using different approaches and locations in the semiarid region of Andalusia (Southern Spain). Based on the obtained results, the use of neighbor data, located within a 50 km radius, highly outperformed the rest of the assessed approaches, with RMSE (root mean squared error) values up to 1.246 mm/day, MBE (mean bias error) values up to −0.001 mm/day, and R2 values up to 0.898. Besides, inland area results outperformed coastal area in most locations, arising the efficiency effects based on the distance to the sea (up to an improvement of 63.89% in terms of RMSE). Finally, machine learning (ML) models (especially MLP (multilayer perceptron)) notably outperformed simple linear regression estimations in the coastal sites, whereas in inland locations, the improvements were not such significant.

URI

http://hdl.handle.net/10396/21746

Fuente

Atmosphere 12(9), 1158 (2021)

Versión del Editor

https://doi.org/10.3390/atmos12091158