Abstract
With recent advances in assistive technologies, improving spatial perception for the visually impaired has proven to be a major challenge. While several methods have been proposed, a holistic real-time solution remains the subject of intense research. In this paper, we present an innovative approach that combines computer vision, speech recognition, and artificial intelligence. At the core of our system is the YOLOv4 model integrated with OpenCV, which has been carefully optimized for robust object recognition in heterogeneous environments. A novel distance estimation algorithm is introduced that uses the focal length of the device’s camera to infer the actual distance based on the dimensions of the detected object within the image. To enhance user interaction, we integrated Palm API from Bard AI, which provides rich auditory descriptions of the environment. This innovative paradigm incorporates other advanced function-alities, encompassing hazard and speed detection with dual OS mode. Furthermore, the model flexibly adjusts to diverse network bandwidths through optimized configurations. Our paper offers a new paradigm in the field of assistive technology and sets a benchmark for future efforts aimed at reducing the barriers created by visual impairments.