Abstract
This study presents an in-depth analysis of machine learning (ML) techniques for predicting water quality index and water quality classification using a dataset containing water quality metrics such as temperature, specific conductance, salinity, dissolved oxygen, depth, pH, and turbidity from multiple monitoring stations. Data preprocessing included imputation for missing values, feature scaling, and categorical encoding, ensuring balanced input features. This research evaluated artificial neural networks, decision trees, support vector machines, random forests, XGBoost, and long short-term memory (LSTM) networks. Results demonstrate that XGBoost and LSTM significantly outperformed other models, with XGBoost achieving an accuracy range of 99.07–99.99% and LSTM attaining an R2 of 0.9999. Compared with prior studies, our approach enhances predictive accuracy and robustness, showcasing advanced generalization capabilities. The proposed models exhibit significant improvements over traditional methods in handling complex, multivariate water quality data, positioning them as promising tools for water quality prediction and environmental management. These findings underscore the potential of ML for developing reliable, scalable water quality monitoring solutions, providing valuable insights for policymakers and environmental managers dedicated to sustainable water resource management.