A Novel Deep Learning-Based Key Frames Selection and Features Fusion for Content-Based Video Retrieval

Publications

A Novel Deep Learning-Based Key Frames Selection and Features Fusion for Content-Based Video Retrieval

Year : 2026

Publisher : Institute of Electrical and Electronics Engineers Inc.

Source Title : IEEE Access

Document Type :

Abstract

Content-Based Video Retrieval (CBVR) systems identify videos similar to a query by directly analyzing visual content, avoiding dependence on textual descriptions. In this paper, we propose a CBVR system that introduces a novel adaptive keyframe selection strategy aimed to preserve the most informative visual content. Unlike existing methods that employ fixed similarity thresholds and a uniform number of keyframes across all videos, the proposed approach dynamically determines both the similarity threshold and the number of keyframes for each video based on its content, enabling video-specific adaptability. First, a novel keyframe selection pipeline based on ResNet50 is used to extract deep features from individual video frames. Redundant frames are removed by retaining only those with minimal feature similarity, where the similarity threshold is automatically determined for each video using the Elbow method. Second, an optical flow–based frame selection strategy is applied to refine the selected keyframes by incorporating motion information. Deep spatiotemporal features are then extracted from both using a 3D deep learning model, and their fusion into a unified video-level representation constitutes an additional novel contribution, as it jointly exploits complementary content and motion cues rather than relying on a single information source. The proposed system achieved an 18.5% F1-score gain on the SumME dataset for video summarization and improved video retrieval performance on UCF101 and HMDB51, with Top-10 accuracy gains of 20.4% and 34.2%, Top-20 mAP gains of 46.7% and 68%, respectively, and a 1.6% Top-10 precision on UCF101 over the closest competing methods.