A study on classifying imbalanced datasets

Publications

A study on classifying imbalanced datasets

Year : 2014

Publisher : Institute of Electrical and Electronics Engineers Inc.

Source Title : 1st International Conference on Networks and Soft Computing, ICNSC 2014 - Proceedings

Document Type :

Abstract

Many problems in the real world are, in general modeled as binary classification problems and often one class samples outnumber other class samples. This imbalance causes the reduction in accuracy of prediction in minority class samples but give overall high accuracy. Ignoring misclassification rate of minority class causes severe problems in many cases such as fraudulent credit card transactions, medical diagnosis and e-mail foldering. Many classification algorithms existing in literature are designed for balanced datasets and these algorithms treat majority and minority class samples equal. In this study, the existing solutions for class imbalance problem and common evaluation techniques used for class imbalance are reviewed. The solutions were applied on three real world datasets. It is observed that a combination of SMOTE and Bagging with Random Forest produced the best overall accuracy of minority class.