Leveraging Spatio-temporal Deep Learning and Fuzzy Class Membership for Robust Content-Based Video Retrieval

Publications

Leveraging Spatio-temporal Deep Learning and Fuzzy Class Membership for Robust Content-Based Video Retrieval

Year : 2025

Publisher : Springer

Source Title : SN Computer Science

Document Type :

Abstract

The exponential growth of information and communication technologies has led to an unprecedented surge in digital data, with a significant portion comprising unstructured visual media, such as images and videos that often lack metadata. This absence of structured metadata presents a major challenge in efficiently managing and extracting value from these vast repositories, rendering traditional search and retrieval methods ineffective. Content-based video retrieval (CBVR) has emerged as a crucial solution, transforming fields such as traffic analysis, video surveillance, medicine, and sports. Unlike static images, videos consist of objects that continuously move and undergo appearance changes over time, making it challenging to effectively capture both spatial details and temporal dynamics. This paper proposes a method that leverages the strengths of DenseNet-151, a densely connected convolutional neural network designed for extracting rich spatial features, and Long Short-Term Memory (LSTM), a recurrent neural network specialized in capturing temporal dependencies. Additionally, a modified distance function incorporating fuzzy class membership is employed to enhance the retrieval of similar videos from the database, leading to improved retrieval performance. The effectiveness of the proposed method is evaluated on the UCF101 Human Action Recognition dataset, demonstrating a 6% improvement in Precision, a 5% increase in Recall, a 5% enhancement in F1-Score, and an 8% boost in AUC compared to the most competitive approaches in the literature.