HGR-FYOLO: a robust hand gesture recognition system for the normal and physically impaired person using frozen YOLOv5
Sen A., Dombe S., Mishra T.K., Dash R.
Article, Multimedia Tools and Applications, 2024, DOI Link
View abstract ⏷
Hand gesture recognition is important in human-machine interaction (HMI), enabling interaction with the systems without physically touching them. But current methodologies regarding gesture recognition face different challenges, such as poor lighting conditions, complex backgrounds, lower detection rates, slower speed, etc. The physically handicapped people with fewer fingers find it difficult to engage with desktop applications in case of complex backgrounds. To overcome these concerns, we have chosen the YOLOv5s model, one small version of YOLOv5 (You Only Look Once) object detection algorithm, to detect and classify the hand portion in a real-time scenario. We have fine-tuned the YOLOv5s architecture by freezing some convolutional layers in the backbone portion, which reduces the number of parameters, model size, and inference time. Two datasets, one public (American sign language) and one custom dataset, named ‘NITR-Hand gesture’ dataset have been utilized to evaluate the suggested work. There is always a trade-off between accuracy and detection speed. Therefore, the accuracy has been slightly reduced after freezing the backbone of YOLOv5s architecture. but the inference time and detection speed have both improved. Experimental results exhibit that our suggested frozen YOLOv5s model has achieved a mean average precision (mAP@50-95) of 92.60%, and the average speed has reached more than 55 frames per second (fps). We have conducted a comparative analysis of our used models with other state-of-the-art methods, and it is noticed that the fine-tuned YOLOv5s have outperformed other models in terms of mAP and inference speed. This prototype is very fruitful for physically impaired persons to interact with the systems in real-time.
Deep Learning-Based Hand Gesture Recognition System and Design of a Human–Machine Interface
Sen A., Mishra T.K., Dash R.
Article, Neural Processing Letters, 2023, DOI Link
View abstract ⏷
Hand gesture recognition plays an important role in developing effective human–machine interfaces (HMIs) that enable direct communication between humans and machines. But in real-time scenarios, it is difficult to identify the correct hand gesture to control an application while moving the hands. To address this issue, in this work, a low-cost hand gesture recognition system based human-computer interface (HCI) is presented in real-time scenarios. The system consists of six stages: (1) hand detection, (2) gesture segmentation, (3) feature extraction and gesture classification using five pre-trained convolutional neural network models (CNN) and vision transformer (ViT), (4) building an interactive human–machine interface (HMI), (5) development of a gesture-controlled virtual mouse, (6) smoothing of virtual mouse pointer using of Kalman filter. In our work, five pre-trained CNN models (VGG16, VGG19, ResNet50, ResNet101, and Inception-V1) and ViT have been employed to classify hand gesture images. Two multi-class datasets (one public and one custom) have been used to validate the models. Considering the model’s performances, it is observed that Inception-V1 has significantly shown a better classification performance compared to the other four CNN models and ViT in terms of accuracy, precision, recall, and F-score values. We have also expanded this system to control some multimedia applications (such as VLC player, audio player, playing 2D Super-Mario-Bros game, etc.) with different customized gesture commands in real-time scenarios. The average speed of this system has reached 25 fps (frames per second), which meets the requirements for the real-time scenario. Performance of the proposed gesture control system obtained the average response time in milisecond for each control which makes it suitable for real-time. This model (prototype) will benefit physically disabled people interacting with desktops.
A novel hand gesture detection and recognition system based on ensemble-based convolutional neural network
Sen A., Mishra T.K., Dash R.
Article, Multimedia Tools and Applications, 2022, DOI Link
View abstract ⏷
Nowadays, hand gesture recognition has become an alternative for human-machine interaction. It has covered a large area of applications like 3D game technology, sign language interpreting, VR (virtual reality) environment, and robotics. But detection of the hand portion has become a challenging task in computer vision and pattern recognition communities. Deep learning algorithm like convolutional neural network (CNN) architecture has become a very popular choice for classification tasks, but CNN architectures suffer from some problems like high variance during prediction, overfitting problem and also prediction errors. To overcome these problems, an ensemble of CNN-based approaches is presented in this paper. Firstly, the gesture portion is detected by using the background separation method based on binary thresholding. After that, the contour portion is extracted, and the hand region is segmented. Then, the images have been resized and fed into three individual CNN models to train them in parallel. In the last part, the output scores of CNN models are averaged to construct an optimal ensemble model for the final prediction. Two publicly available datasets (labeled as Dataset-1 and Dataset-2) containing infrared images and one self-constructed dataset have been used to validate the proposed system. Experimental results are compared with the existing state-of-the-art approaches, and it is observed that our proposed ensemble model outperforms other existing proposed methods.