Abstract
In the era of machine learning we are solving the classification problems by training the labeled classes. But sometimes due to insufficient data in some of the training classes, the system training is inadequate for these minority classes. In this case the output for the classes obtained from the less amount of trained data are miserably inappropriate and biased towards the classes having more data. This problem is known as a class imbalance problem. In such cases, standard classifiers tend to be overpowered by the expansive classes and disregard the little ones. As a result, the performance of machine learning and the deep learning algorithms are also reducing and sometimes highly unacceptable too, mainly if it is related to crucial data like medical and health related. Though various researchers provided some methods to solve this problem but mostly they are problem specific and suitable with the specific classifier only. To find a generalized and effective solution to this problem, we have applied various smote variants for solving the imbalanced factors in dataset and finally improved the performance of the various machine learning and deep learning algorithms. We have experimented and analyzed the effects of SMOTE variants on various machine learning techniques over six standard medical datasets. We have found that SMOTE variants are very effective, and they improve the standard performance measures (Accuracy, Precision, Recall and F1-Score). Additionally, based on our research, it is feasible to determine which smote variation works best with machine learning methods and datasets.