HydroPredictor A Hybrid Machine Learning Model for Addressing Data Scarcity in Groundwater Prediction

Publications

HydroPredictor A Hybrid Machine Learning Model for Addressing Data Scarcity in Groundwater Prediction

HydroPredictor A Hybrid Machine Learning Model for Addressing Data Scarcity in Groundwater Prediction

Year : 2025

Publisher : Nature Portfolio

Source Title : Scientific Reports

Document Type :

Abstract

Groundwater prediction in data-scarce and environmentally sensitive regions presents a persistent challenge due to limited observational data, spatial heterogeneity, and the nonlinear nature of hydrogeological processes. In this study, we propose HydroPredictor, a hybrid machine learning framework that integrates the categorical handling efficiency of CatBoost with the nonlinear feature learning capacity of a regularized Multi-Layer Perceptron (MLP). The model was trained on a geo- referenced dataset of 315 samples from the Feija Basin in southeastern Morocco, incorporating ten environmental predictors such as elevation, rainfall, soil permeability, NDVI, and topographic wetness index. The pipeline includes Optuna-based hyperparameter optimization and 5-fold cross-validation to ensure robustness and generalization. HydroPredictor achieved a testing accuracy of 89.23%, with an F1-score of 0.8937 and Area Under the Curve (AUC) values exceeding 0.90 across all groundwater potential classes. Statistical validation using the Friedman and Wilcoxon signed-rank tests (p < 0.05) confirmed its significant outperformance over conventional models, including Random Forest, Support Vector Machine (SVM), and standalone MLP. Furthermore, HydroPredictor demonstrated superior generalization compared to prior models in the literature (e.g., RF-SSA: AUC = 0.840; GBDT: AUC = 0.88), while maintaining minimal overfitting (∆Accuracy = 0.35%). By combining interpretable tree-based embeddings with deep neural representations, HydroPredictor provides a robust and scalable solution for groundwater classification in data-limited settings, offering a reproducible and operationally relevant tool for sustainable groundwater resource management under climatic and environmental uncertainty.