Abstract
This research proposes the use of hybrid machine learning methods to mimic dye removal efficiency. Hyperparameter tuning via differential evolution (DE), genetic algorithm (GA), random search (RS), and grid search (GS) with the XGBoost model was conducted to achieve more accurate results. This study focused on the relationships between the initial dye concentrations of Fast Green, Eosin Y, and Quinine Yellow dyes, their initial pH, ACMOF adsorbent dosage (activated carbons: metal‒organic frameworks), and sonication time as input variables, with the removal percentage as the output data. The analysis emphasized the correlation between the inputs and outputs, resulting in the generation of four scenarios: 4 inputs, 3 inputs, 2 inputs, and 1 input. The correlation analysis revealed a weak input‒output relationship and the presence of outliers in the data. The use of advanced models, such as XGBoost, improved model performance and accurately predicted dye removal efficiency. The models performed well across different input scenarios, demonstrating their reliability and effectiveness. The results also revealed the importance of data preprocessing techniques in improving the structure and relationships within the data. The DE_XGBoost model outperforms all the other methods in terms of R2 (R2 values of 0.977, 0.958, 0.924, and 0.997 for 4-input, 3-input, 2-input, and 1-input, respectively), demonstrating its DE effectiveness in generalizing the model and enhancing its predictivity. This research contributes to the development of more efficient techniques for dye removal and environmental pollution mitigation, addressing the challenges of traditional testing methods. These findings have implications for industries that use dyes and can help mitigate the environmental pollution caused by dye effluents.