Deep Ensemble Architecture for High-Dimensional Class Imbalance Processing Using Modified SMOTE-ENC

Publications

Deep Ensemble Architecture for High-Dimensional Class Imbalance Processing Using Modified SMOTE-ENC

Year : 2025

Publisher : Springer

Source Title : SN Computer Science

Document Type :

Abstract

High-dimensional (HD) data with class imbalance is common in many real-world domains such as medical, finance, cybersecurity, computer vision, and natural language processing. Development of an efficient classifier is crucial to mitigate the challenges such as sparse data, bias towards the majority class, overfitting, and computational complexity while ensuring accurate and fair classification across classes. Hence, in this paper, we have proposed a novel “Deep Ensemble Architecture for High-dimensional data Classification (DEA-HDDC)”. This proposed architecture consists of several stages, such as data preprocessing and class balancing, feature extraction, feature selection, and classification. In the initial stage, min-max normalisation is applied for data preprocessing, and a modified Synthetic Minority Oversampling Technique—Encoded Nominal and Continuous (SMOTE-ENC) is proposed for data balancing. In the second stage, a feature extraction technique is deployed to extract meaningful insights and raw data features such as enhanced entropy measures, statistical features, and mutual information. In the third phase, a hybrid Kookaburra-Assisted Red Panda Optimisation (KARPO) algorithm is proposed, which efficiently selects relevant features by balancing exploration and exploitation. Finally, the classification is performed using LinkNet, improved SqueezeNet, and a Bi-directional long shortterm memory (Bi-LSTM)-based ensemble classifier (LISB-EC). We made the final classification decision by averaging the outputs of these classifiers and leveraging the combined strengths of the ensemble to enhance accuracy and robustness. We have compared the efficiency of our proposed model with state-of-the-art methods available in the literature using benchmark datasets. From the result analysis, it is observed that the proposed model outperformed established baseline methods including Random Oversampling, Random Undersampling, Tomek Links, and conventional SMOTE, achieving an accuracy of up to 97.7%, F-measure of 0.98, precision of 0.99, and MCC of 0.97, compared to baseline ranges of 71.1%–80.6% accuracy and 0.568–0.749 F-measure, demonstrating statistically and practically significant improvements across all benchmark datasets.