Faculty Dr Gyanajyoti Routray

Dr Gyanajyoti Routray

Assistant Professor

Department of Computer Science and Engineering

Contact Details

gyanajyoti.r@srmap.edu.in

Office Location

Homi J Bhabha Block, Level 3, Cubicle No: 15

Education

2023
Ph.D
Indian Institute of Technology, Kanpur
2014
M.Tech
Biju Patnaik University of Technology
India
2012
B.Tech
Biju Patnaik University of Technology
India

Experience

  • June 2023 - Jan 2025 – Project Engineering – India Institute of Technology Kanpur
  • Feb 2010 - July 2016 – Assistant Professor – C. V. Raman College of Engineering, Bhubaneswar ( At present C. V. Raman Global University Bhubaneswar)
  • Dec 2005 - Aug 2008 – O&M Engineering – Ericsson India Pvt. Ltd

Research Interest

  • Acoustic data acquisition and Multi-Channel processing by focusing on technologies related to Microphone arrays, 6DoF Acoustic data for Beamforming, Localization, Tracking, Source enhancement in the challenging Acoustic scenarios
  • Immersive Audio Rendering and Human Perception using lodspeaker and headphone based immersive audio simulation, acoustic reproduction system, and binaural auditory models

Awards

No data available

Memberships

  • IEEE

Publications

  • Sparse Bayesian Integrated CNN Framework for Enhanced Acoustic Source Localization

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi; Gyanajyoti Routray; Rajesh M. Hegde

    Source Title: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),

    View abstract ⏷

    This paper presents a novel framework for super-resolution direction of arrival (DOA) estimation of acoustic sources in the spherical harmonics (SH) domain. The proposed approach combines sparse Bayesian learning (SBL) with a convolutional neural network (CNN). The CNN is utilized to classify DOA based on the spherical harmonics decomposition (SHD) of recordings from a spherical microphone array (SMA), providing coarse DOA estimates. These estimates are then refined using SBL, which operates on a densely sampled grid around the CNN-predicted DOA classes to achieve precise localization. The CNN component exhibits robustness in noisy and reverberant environments, while SBL specializes in high-resolution localization of multiple sparse sources. By leveraging the strengths of both methods, the SH-CNN-SBL framework enhances DOA estimation accuracy in challenging conditions. Extensive simulations and real-world experiments are performed to validate the effectiveness, of the proposed method in achieving a resolution of 1°.
  • Feature Transformation for Fast and Efficient Learning in Near-Field Source Localization

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, Rajesh M Hegde

    Source Title: IEEE Transactions on Artificial Intelligence, Quartile: Q1

    View abstract ⏷

    This work develops an efficient and fast learning method for near-field acoustic source localization using the spherical harmonics (SH) feature transformation. The SH features are derived through the SH decomposition of the microphone array recordings. However, these SH features are often impaired by noise, interference, and reverberation, hindering localization accuracy. In this context, we proposed a feature transformation that leverages the signal subspace of the SH decomposed signals. The feature transformation reduces the irregularities in decomposed SH parameter in the noisy and reverberate condition. Therefore, the proposed subspace based feature captures significant directional and range-dependent cues for localization and enhances the training performance and accuracy with fewer epochs. The efficacy of this fast learning approach is demonstrated using convolutional neural network (CNN) training to map the input features to localization classes. The performance of the proposed approach is evaluated through exhaustive simulation and experiments and compared with the previous methods.
  • Sparsity-driven loudspeaker gain optimization for sound field reconstruction with spherical microphone array

    Dr Gyanajyoti Routray, Gyanajyoti Routray and Rajesh M Hegde

    Source Title: Digital Signal Processing, Volume 154, 104688, 2024. ISSN 1051-2004 \, Quartile: Q2

    View abstract ⏷

    The paper presents a sparsity-driven method utilizing loudspeakers to reconstruct spatial sound fields using measurements obtained from a spherical microphone array (SMA). Employing spherical harmonics decomposition (SHD), the SMA recordings are characterized in the spherical harmonics domain. The gains for the loudspeakers are determined through an optimization problem, equating spherical harmonics pressure coefficients from primary and secondary sources. Furthermore, the sparsity within the loudspeaker feeds is redefined as a constrained sparse optimization problem, integrating linearity and orthogonality constraints. This method effectively reduces the required loudspeakers while maintaining sound field quality. The Bregman iteration method is applied to solve the constrained optimization problem. Rigorous evaluation based on reconstructed sound fields and objective measures highlights significant enhancements compared to least square and compressed sensing methods.
  • Improving Source Tracking Accuracy through Learning-Based Estimation Methods in SH Do- main : A Comparative Study

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, Devansh Kumar Jha, and Rajesh M. Hegde

    Source Title: IEEE Transactions on Artificial Intelligence, vol. 5, no. 8, pp. 3974-3984, Aug. 2024., Quartile: Q1

    View abstract ⏷

    Acoustic source tracking is significant across applications like surveillance, teleconferencing, and robot audition, yet the complexity introduced by reverberation, background noise, and overlapping sources impedes precise source localization. This article uses learning-based localization methods to introduce a resilient and intelligent acoustic source tracking approach in the spherical harmonics (SHs) domain. The tracking algorithms anticipate moving source locations by leveraging past predictions and direction of arrival (DOA) estimations. The prediction probability is computed through alpha–beta and Kalman filtering applied to the estimated DOAs, which are likelihood probabilities obtained from learning models. Utilizing the spatial attributes of sound sources encoded in SH signals, diverse learning-based frameworks are introduced to capture the intricate relationship between SH features and source locations. Supervised learning is utilized to train the models that minimize localization errors between predicted and ground truth positions. Experimental assessments underscore the efficacy and resilience of our proposed approach, conducted using LOCAlization and TrAcking (LOCATA) data, revealing a substantial enhancement in tracking accuracy compared to baseline methods.
  • Octant Spherical Harmo- nics Features for Source Localization using Artificial Intelligence based on Unified Learning Framework

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, and Rajesh M. Hegde

    Source Title: IEEE Transactions on Artificial Intelligence, vol. 5, no. 8, pp. 3845-3857, Aug., Quartile: Q1

    View abstract ⏷

    Acoustic source tracking is significant across applications like surveillance, teleconferencing, and robot audition, yet the complexity introduced by reverberation, background noise, and overlapping sources impedes precise source localization. This article uses learning-based localization methods to introduce a resilient and intelligent acoustic source tracking approach in the spherical harmonics (SHs) domain. The tracking algorithms anticipate moving source locations by leveraging past predictions and direction of arrival (DOA) estimations. The prediction probability is computed through alpha–beta and Kalman filtering applied to the estimated DOAs, which are likelihood probabilities obtained from learning models. Utilizing the spatial attributes of sound sources encoded in SH signals, diverse learning-based frameworks are introduced to capture the intricate relationship between SH features and source locations. Supervised learning is utilized to train the models that minimize localization errors between predicted and ground truth positions. Experimental assessments underscore the efficacy and resilience of our proposed approach, conducted using LOCAlization and TrAcking (LOCATA) data, revealing a substantial enhancement in tracking accuracy compared to baseline methods.
  • LEARNING-BASED MASKING FOR RELIABLE SOURCE LOCALIZATION INTERFERED BY UNDESIRED DIRECTIONAL NOISE

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi; Gyanajyoti Routray; Siddesh B. Hazare; Rajesh M. Hegde

    Source Title: Forum Acusticum, 10th Convention of the European Acoustics Association,

    View abstract ⏷

    It is incredibly challenging to simultaneously locate an acoustic source in a noisy, reverberant environment and mitigates directional interference. The proposed study uses a spherical harmonic decomposition method to termine the spherical harmonics phase magnitude (SH-PM) components corresponding to the received spherical microphone array (SMA) signals. Before SH-PM components are used as input features to the CNN model, inary masking removes directional interference and emphasizes the desired audio source. In this work, the binary mask is estimated using the learning technique such that it is possible to reliably discriminate between ceptable and undesired sources using real-time mask estimation. The proposed strategy creates a learning-based mask to enable real-time and reliable filtering of the undesirable source. Because of this, the entire strategy is extremely flexible and adaptable. By creating datasets, extensive simulations evaluate the effectiveness of the offered strategy. Additionally, the approach is experimentally validated by conducting tests in a live lab setting. The significance of the suggested strategy promotes the use of the technique in real-world situations.
  • Intelligent sniper localization technique using convolutional neural network

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Reddy Rakesh, Gyanajyoti Routray, Rajesh M. Hegde

    Source Title: Inter-Noise 2023,52nd International Congress and Exposition on Noise Control Engineering,

    View abstract ⏷

    A reliable methodology for sniper localization utilizing a machine-learning framework has been proposed herein. The proposed work is based on the time of arrival of the shock waves (SW) in contrast to the conventional a pproaches, which utilized both the muzzle blast (MB) and SW. Since the MB is susceptive to environmental disturbances, the proposed solution is robust and reliable. The SWs are captured with the 2D non-linear array, and the time delay between the microphones is approximated using the generalized cross-correlation phase transfer (GCC-PHAT). Subsequently, a convolutional neural network (CNN) model is trained to map the input GCC-PHAT features to the sniper position. Adopting the CNN model provides robustness in the method, which also performs better in highly noisy environments. The performance of the proposed technique shows a significant improvement compared to the conventional methods providing a motivation to be used in practical applications.
  • CRNN-based spatial active noise control in spherical harmonics domain

    Dr Gyanajyoti Routray, Gyanajyoti Routray, Siddhesh Bharat Hazare, Priyadarshini Dwivedi, Rajesh M Hegde

    Source Title: Inter-Noise 2023,52nd International Congress and Exposition on Noise Control Engineering,

    View abstract ⏷

    In this work, an active noise control (ANC) in the spherical harmonics (SH) domain, along with a learning model, is developed. The traditional ANCs are built on the adaptive signal processing framework and fail in the presence of nonlinear distortions. Further, these algorithms are limited to the horizontal plane only. In contrast, the proposed work develops the ANC in the 3D space using the SH representation with the spherical microphone array (SMA) as the error microphone. The SH decomposition calculates the SH pressure coefficients from the SMA recordings. Subsequently, a convolutional recurrent neural network (CRNN) is trained to estimate the real and imaginary spectrograms from the SH coefficients as the input features. The output of the CRNN is the cancelling signal that eliminates or attenuates the primary noise in the ANC system. A delay-compensated method addresses ANC system latency. According to simulations and experiments, the proposed learning-based spatial ANC reduces wideband noise and generalizes to untrained noises
  • Diversity Minimization Technique for Multiple Measurement Vector-based Super-resolution spatial Audio Imaging

    Dr Gyanajyoti Routray, Gyanajyoti Routray, Priyadarshini Dwivedi, Rajesh M Hegde

    Source Title: Forum Acusticum, 10th Convention of the European Acoustics Association,

    View abstract ⏷

    Ambisonics is an efficient spatial sound acquisition and reproduction technique in the spherical harmonic domain. At low frequencies, lower-order ambisonics reproduction is accurate, but at high frequencies, the spatial resolution suffers. An increase in frequency shrinks the radius of the error-free region and degrades the spatial resolution. Higher-order ambisonics (HOA) provided better spatial resolution in this context. However, sound spatial acquisition in HOA is constrained by hardware complexity and storage space, in contrast to low-order ambisonics (B-format). So, it is worthwhile to acquire the sound scene at low order to reduce hardware complexity and storage requirement and upscale to a higher order while reproducing to improve the spatial resolution. This work investigated algorithms based on minimizing the diversity measures for obtaining higher-order ambisonics from the B-format signals. In particular, we are interested in the FOCUSS (FOCal Underdetermined System Solver) class of algorithms, which is an alternative and complementary approach to the sequential forward method. Also, a more robust regularized FOCUSS algorithm for the sparse inverse problem is investigated further. The performance of the proposed upscaling method is evaluated using the mean square error metrics. The subjective evaluation is
  • Sniper Localization using Acoustic Signal Processing based on Time of Arrivals

    Dr Gyanajyoti Routray, Reddy Rakesh; Gyanajyoti Routray; Priyadarshini Dwivedi; Rajesh M. Hegde

    Source Title: 2023 National Conference on Communications (NCC),

    View abstract ⏷

    This paper presents a robust sniper localization technique based on the time of arrival of the shock waves, unlike the conventional methods that need both muzzle blast and shock wave. In the proposed work, a two-dimensional array geometry is considered to get rid of solving non-linear equations, as in the case of the linear array. Since the bullet’s velocity is not constant throughout the trajectory, the deceleration parameter is also considered in the proposed work to improve the accuracy compared to conventional methods. The time of arrival of the shock waves are measured from the generalized cross correlation phase transform (GCC-PHAT) between microphone signal recordings. Extensive simulations are carried out for the proposed work and to compare it with the baseline method. The performance of the proposed method is observed to be better than the previous method.
  • Long-Term Temporal Audio Source Localization using SH-CRNN

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi; Siddesh Bharat Hazare; Gyanajyoti Routray; Rajesh M. Hegde

    Source Title: 2023 National Conference on Communications (NCC),

    View abstract ⏷

    Acoustic source localization in a noisy and reverberating environment is still a challenging problem in signal processing. An improved technique has been developed herein exploring the convolutional recurrent neural network (CRNN) in the spherical harmonics domain for the far-field direction of arrival (DOA) estimation. The source signal is recorded using a spherical microphone array (SMA), and the spherical harmonics decomposition (SHD) of the recordings yields the spherical harmonics (SH) pressure coefficients. Subsequently, the SH phase and magnitude coefficients, are calculated. The CRNN model is designed and trained with long-term temporal SH magnitude and phase coefficients across all the frequencies to classify these features corresponding to the source locations. The proposed technique is assessed by extensive simulations and experimental analysis at various the signal-to-noise (SNR) ratio and reverberation time RT60. The root mean square error (RMSE) is evaluated for the proposed DOA estimation technique, and a comparison with the state-of-art methods shows a significant improvement in the localization of the audio source.
  • Learning based method for near field acoustic range estimation in spherical harmonics domain using intensity vectors

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, and Rajesh M. Hegde

    Source Title: Pattern Recognition Letters Volume 165, January 2023, Pages 17-24, Quartile: Q1

    View abstract ⏷

    Near-field acoustic range estimation is considered one of the least explored research problems in digital signal processing under noise and reverberant conditions. This letter develops a new learning-based range estimation technique utilizing the spherical harmonics intensity (SH-INT) coefficients. The conventional range estimation in the spherical harmonics (SH) domain relies on the pressure coefficients. However, at high frequencies, these coefficients of different order and range overlap and hinder the accuracy of range estimation. On the contrary, the SH-INT coefficients are well distinguished at high frequencies for various orders and ranges, making these features favorable for accurate range estimation using learning algorithms. Since the SH-INT coefficients in the radial direction are independent of the source signal and vary with range, a convolutional neural network (CNN) model has been adopted to map the SH-INT coefficients with the range classes. The performance of the proposed spherical harmonic intensity (SH-INT) features in the context of near-field range estimation is validated by conducting exhaustive experiments on simulated and real data. Further, the error in near-field source range estimates is characterized using root mean square error (RMSE) criteria. The results are impactful and encourage the use of this method for practical near-field source range estimation applications.
  • Spatial audio reproduction over ad hoc loudspeaker array using near-field compensation in spherical harmonics domain

    Dr Gyanajyoti Routray, Gyanajyoti Routray and Rajesh M Hegde

    Source Title: Digital Signal Processing Volume 142, October 2023, 104203, Quartile: Q2

    View abstract ⏷

    Spatial audio reproduction using the loudspeaker array introduces the curvature effect leading to a distorted listening experience when the listener is in the near field. In the near-field, the loudspeakers are approximated as point sources (spherical wave) and amplify the mode vectors. Further, the problem becomes more challenging for the irregular loudspeaker arrangement, which causes uneven energy distribution in the reproduction region. In this context, a near-field compensation is applied to the encoded ambisonics coefficients. An optimization problem is formulated, such as the loudspeaker gains encoded with spherical harmonics basis coefficients should match the target ambisonics coefficients. Further, the in-phase and quadrature components of the energy localization vector are imposed as the constraints to direct maximum energy in the reproduction region. The solution to the optimization problem is obtained using a derivative-free optimization solver. The performance of the proposed methods is evaluated for ITU-R recommended loudspeaker layouts using the technical and perceptual evaluation attributes.
  • Spherical harmonics domain-based approach for source localization in presence of directional interference

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, and Rajesh M. Hegde

    Source Title: JASA Express Letters, Quartile: Q!

    View abstract ⏷

    This paper presents a learning-based method for source localization in the presence of directional interference under reverberant and noisy conditions. The proposed method operates on the spherical harmonic decomposition of the spherical microphone array recordings to yield spherical harmonics coefficients as the features. An attention mechanism is incorporated through a binary mask that filters out the dominant undesired source components from the features before training. A convolutional neural network is trained to map the phase and magnitude of the filtered coefficients with the location class. Hence, the objective is to develop the binary mask followed by source localization.
  • Hybrid SH-CNN-MP approach for super resolution DOA estimation

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, Rajesh M Hegde

    Source Title: 2022 56th Asilomar Conference on Signals, Systems, and Computers,

    View abstract ⏷

    A novel framework for super-resolution direction of arrival (DOA) estimation of acoustic sources in the spherical harmonics (SH) domain has been addressed in this work. The proposed method is developed in two stages. First, a convolutional neural network (CNN) model is investigated to obtain the DOA classes from the spherical harmonics decomposition (SHD) of the spherical microphone array (SMA) recordings. Subsequently, the matching pursuit (MP) algorithm with a high-resolution search grid corresponding to the DOA classes is applied to the SHD signals to localize the acoustics source. Since the CNN model performs better in the noisy and reverberant environment and the MP algorithm uses the orthogonal property of the SH basis function to provide high-resolution localization, the proposed hybrid model takes advantage of both these models. Extensive simulations and real-time experiments are performed to validate the performance of the proposed model.
  • Joint doa estimation in spherical harmonics domain using low complexity cnn

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Raj Prakash Gohil, Gyanajyoti Routray, Vishnuvardhan Varanasiy, Rajesh M Hegde

    Source Title: 2022 IEEE International Conference on Signal Processing and Communications (SPCOM),

    View abstract ⏷

    Direction of arrival (DOA) estimation for multi-channel speech enhancement is a challenging problem. In this context, this paper proposes a new method for joint DOA estimation using a low complexity convolutional neural network (CNN) architecture. The spherical harmonic (SH) coefficients of the received speech signal are obtained from the spherical harmonics decomposition (SHD). The magnitude and phase features are extracted from these SH coefficients and combined as a single feature for training the CNN. A single CNN model is trained using these combined features in contrast to two CNN models used in earlier work. Both azimuth and elevation are then obtained for estimation of DOA from this single CNN. Extensive simulations are also conducted for the performance evaluation of the proposed low complexity CNN model. It is observed that the proposed CNN model provides robust DOA estimates at the various signal to noise ratios (SNR) and reverberation times with reduced computational complexity. Performance evaluated in terms of the gross error (GE) and run-time complexity also provides interesting results motivating the use of the proposed model in practical applications.
  • DOA Estimation using Multiclass-SVM in Spherical Harmonics Domain

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi; Gyanajyoti Routray; Rajesh M. Hegde

    Source Title: 2022 IEEE International Conference on Signal Processing and Communications (SPCOM),

    View abstract ⏷

    Direction of arrival (DOA) estimation is still a challenging and fundamental problem in acoustic signal processing. This paper proposes a new method for DOA estimation that utilizes the support vector machine (SVM) based classification. The source signal is recorded by the spherical microphone array (SMA) and decomposed into the spherical harmonics domain. The phase and the magnitude features are calculated from the spherical harmonics (SH) decomposed signals. A multiclass support vector machine (M-SVM) algorithm is implemented to classify these phase and magnitude features to the DOA classes. Since the SVM is a non-probabilistic and deterministic model, it is computationally faster and highly reduced complexity than the neural network-based learning models. Extensive simulations are conducted for the performance evaluation of the proposed method. It is observed that the proposed model provides robust DOA estimates at various signal-to-noise ratios (SNR) and reverberation time. Performance evaluated in terms of the root mean square error (RMSE) provides interesting results motivating the use of the proposed model in practical applications.
  • Upscaling HOA Signals using Order Recursive Matching Pursuit in Spherical Harmonics Domain

    Dr Gyanajyoti Routray, Gyanajyoti Routray, Sumit Kumar Sahu, Rajesh M Hegde

    Source Title: 2022 IEEE International Conference on Signal Processing and Communications (SPCOM),

    View abstract ⏷

    Spatial sound acquisition in Higher-Order Ambisonics (HOA) is constrained by hardware complexity and storage space. In contrast, the low order ambisonics (B-format Signals) suffers from low spatial resolution. So it is worthwhile to acquire the sound at low order to reduce hardware complexity and storage requirement and upscale to a higher order while reproducing to improve the spatial resolution. In this work, a sparse framework is formulated that efficiently uses the Order Recursive Matching Pursuit (ORMP) algorithm for Multiple Measurement Vectors (MMV) to decompose the low-order encoded signal. Subsequently, the upscaled HOA signal is obtained from the decomposed low-order ambisonics to reproduce the spatial audio with high spatial resolution. The performance of the proposed upscaling method is evaluated using the metrics such as a Mean Square Error (MSE) in upscaled signals and error in the reproduced sound field. The subjective evaluation is carried out using a listening test and compared with state-of-art methods.
  • Far-field Source Localization in Spherical Harmonics Domain using Acoustic Intensity Vector

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, Rajesh M Hegde

    Source Title: 24th International Congress on Acoustics (ICA 2022),

    View abstract ⏷

    Source localization in the presence of reverberation and a noisy environment is still a challenging research problem and has various signal processing applications. This paper proposes a novel far-field source localization method using the acoustic intensity vector in the spherical harmonics domain. The mathematical model for the sound pressure captured by the spherical microphone array (SMA) is first developed in the spherical harmonics domain. Subsequently, the acoustic intensity vector is derived from the spherical harmonics decomposition of the pressure and acoustic velocity. As the acoustic velocity efficiently preserves the directional information, the intensity vector also contains directional and energy information. The dependency of location on the intensity vector is further explored. Since the intensity vector in the azimuth and elevation plane varies with the location, a unified convolutional neural network (CNN) model is selected to map the intensity features to the locations in reverberant and noisy conditions. Extensive imulations and experiments are conducted both on simulated and real speech data for evaluating the performance of the proposed localization method. The results show a significant improvement in localization accuracy and mean square error (MSE) compared with the state-of-art methods.
  • Binaural Reproduction of HOA Signal using Temporal Convolutional Networks

    Dr Gyanajyoti Routray, Gyanajyoti Routray, Priyadarshini Dwivedi, Raj Prakash Gohil, Rajesh M Hegde

    Source Title: 24th International Congress on Acoustics (ICA 2022),

    View abstract ⏷

    In this work, a temporal convolutional network (TCN) based binaural reproduction of higher-order ambisonics (HOA) signals in the spherical harmonics (SH) domain is proposed. The binaural rendering is characterized by the head-related transfer function (HRTF). Since the HRTFs cannot be measured for all the directions, it limits error-free binaural reproduction. The proposed work presents a data-driven approach to learning binaural cues from the anthropometric parameter and source directions. The task is to estimate masking functions that transform the higher-order ambisonics (HOA) signals into binaural signals. The learning framework takes the HOA signals as the input along with the anthropometric parameters to generate the binaural signals. In the proposed method, the TCN implicitly learns the HRTFs parameter and produces the binaural signal. The performance of the method is evaluated based on the reproduction accuracy and mean square error (MSE). Further real-time experiments are carried out using the CIPIC HRTF dataset and the binaural recording using the autogenously developed bionic ears to validate the performance of the proposed method.
  • Binaural Source Localization in Median Plane using Learning based Method for Robot Audition

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, Rajesh M Hegde

    Source Title: 24th International Congress on Acoustics (ICA 2022),

    View abstract ⏷

    This article presents a learning-based binaural source localization technique in the median plane and its application to robot audition. Binaural recordings capture the audio signal and acoustic transfer function from the ource to the ears, known as the head-related transfer function (HRTF), which parameterizes spatial cues such as interaural time difference (ITD) and interaural level difference (ILD). ITD and ILD cues are prominent for source localization in the horizontal plane. Since ITD and ILD are nearly equal to zero in the median plane (the ear canal of both the ears is colocated), the localization is complex. Therefore, monaural spectral cues such as spectral notches are investigated for median plane source localization. The spectral notch represents the delay between the direct and the reflected wave. As it varies with the elevation angle, a learning-based model is developed to map the spectral notch with the elevation angle. The spectral notch features are extracted from the binaural recording using linear prediction cepstral coefficients (LPCC) and linear prediction residual oefficients (LPRC). Simulations and experiments are carried out using high-spatial-resolution HRTF measurements from CIPIC dataset to evaluate the performance. The results show a significant improvement in localization accuracy compared with existing methods.

Patents

Projects

Scholars

Interests

  • Embedded Systems
  • Internet of Things
  • Wireless Sensor Networks

Thought Leaderships

There are no Thought Leaderships associated with this faculty.

Top Achievements

Research Area

No research areas found for this faculty.

Education
2012
B.Tech
Biju Patnaik University of Technology
India
2014
M.Tech
Biju Patnaik University of Technology
India
2023
Ph.D
Indian Institute of Technology, Kanpur
Experience
  • June 2023 - Jan 2025 – Project Engineering – India Institute of Technology Kanpur
  • Feb 2010 - July 2016 – Assistant Professor – C. V. Raman College of Engineering, Bhubaneswar ( At present C. V. Raman Global University Bhubaneswar)
  • Dec 2005 - Aug 2008 – O&M Engineering – Ericsson India Pvt. Ltd
Research Interests
  • Acoustic data acquisition and Multi-Channel processing by focusing on technologies related to Microphone arrays, 6DoF Acoustic data for Beamforming, Localization, Tracking, Source enhancement in the challenging Acoustic scenarios
  • Immersive Audio Rendering and Human Perception using lodspeaker and headphone based immersive audio simulation, acoustic reproduction system, and binaural auditory models
Awards & Fellowships
No data available
Memberships
  • IEEE
Publications
  • Sparse Bayesian Integrated CNN Framework for Enhanced Acoustic Source Localization

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi; Gyanajyoti Routray; Rajesh M. Hegde

    Source Title: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),

    View abstract ⏷

    This paper presents a novel framework for super-resolution direction of arrival (DOA) estimation of acoustic sources in the spherical harmonics (SH) domain. The proposed approach combines sparse Bayesian learning (SBL) with a convolutional neural network (CNN). The CNN is utilized to classify DOA based on the spherical harmonics decomposition (SHD) of recordings from a spherical microphone array (SMA), providing coarse DOA estimates. These estimates are then refined using SBL, which operates on a densely sampled grid around the CNN-predicted DOA classes to achieve precise localization. The CNN component exhibits robustness in noisy and reverberant environments, while SBL specializes in high-resolution localization of multiple sparse sources. By leveraging the strengths of both methods, the SH-CNN-SBL framework enhances DOA estimation accuracy in challenging conditions. Extensive simulations and real-world experiments are performed to validate the effectiveness, of the proposed method in achieving a resolution of 1°.
  • Feature Transformation for Fast and Efficient Learning in Near-Field Source Localization

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, Rajesh M Hegde

    Source Title: IEEE Transactions on Artificial Intelligence, Quartile: Q1

    View abstract ⏷

    This work develops an efficient and fast learning method for near-field acoustic source localization using the spherical harmonics (SH) feature transformation. The SH features are derived through the SH decomposition of the microphone array recordings. However, these SH features are often impaired by noise, interference, and reverberation, hindering localization accuracy. In this context, we proposed a feature transformation that leverages the signal subspace of the SH decomposed signals. The feature transformation reduces the irregularities in decomposed SH parameter in the noisy and reverberate condition. Therefore, the proposed subspace based feature captures significant directional and range-dependent cues for localization and enhances the training performance and accuracy with fewer epochs. The efficacy of this fast learning approach is demonstrated using convolutional neural network (CNN) training to map the input features to localization classes. The performance of the proposed approach is evaluated through exhaustive simulation and experiments and compared with the previous methods.
  • Sparsity-driven loudspeaker gain optimization for sound field reconstruction with spherical microphone array

    Dr Gyanajyoti Routray, Gyanajyoti Routray and Rajesh M Hegde

    Source Title: Digital Signal Processing, Volume 154, 104688, 2024. ISSN 1051-2004 \, Quartile: Q2

    View abstract ⏷

    The paper presents a sparsity-driven method utilizing loudspeakers to reconstruct spatial sound fields using measurements obtained from a spherical microphone array (SMA). Employing spherical harmonics decomposition (SHD), the SMA recordings are characterized in the spherical harmonics domain. The gains for the loudspeakers are determined through an optimization problem, equating spherical harmonics pressure coefficients from primary and secondary sources. Furthermore, the sparsity within the loudspeaker feeds is redefined as a constrained sparse optimization problem, integrating linearity and orthogonality constraints. This method effectively reduces the required loudspeakers while maintaining sound field quality. The Bregman iteration method is applied to solve the constrained optimization problem. Rigorous evaluation based on reconstructed sound fields and objective measures highlights significant enhancements compared to least square and compressed sensing methods.
  • Improving Source Tracking Accuracy through Learning-Based Estimation Methods in SH Do- main : A Comparative Study

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, Devansh Kumar Jha, and Rajesh M. Hegde

    Source Title: IEEE Transactions on Artificial Intelligence, vol. 5, no. 8, pp. 3974-3984, Aug. 2024., Quartile: Q1

    View abstract ⏷

    Acoustic source tracking is significant across applications like surveillance, teleconferencing, and robot audition, yet the complexity introduced by reverberation, background noise, and overlapping sources impedes precise source localization. This article uses learning-based localization methods to introduce a resilient and intelligent acoustic source tracking approach in the spherical harmonics (SHs) domain. The tracking algorithms anticipate moving source locations by leveraging past predictions and direction of arrival (DOA) estimations. The prediction probability is computed through alpha–beta and Kalman filtering applied to the estimated DOAs, which are likelihood probabilities obtained from learning models. Utilizing the spatial attributes of sound sources encoded in SH signals, diverse learning-based frameworks are introduced to capture the intricate relationship between SH features and source locations. Supervised learning is utilized to train the models that minimize localization errors between predicted and ground truth positions. Experimental assessments underscore the efficacy and resilience of our proposed approach, conducted using LOCAlization and TrAcking (LOCATA) data, revealing a substantial enhancement in tracking accuracy compared to baseline methods.
  • Octant Spherical Harmo- nics Features for Source Localization using Artificial Intelligence based on Unified Learning Framework

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, and Rajesh M. Hegde

    Source Title: IEEE Transactions on Artificial Intelligence, vol. 5, no. 8, pp. 3845-3857, Aug., Quartile: Q1

    View abstract ⏷

    Acoustic source tracking is significant across applications like surveillance, teleconferencing, and robot audition, yet the complexity introduced by reverberation, background noise, and overlapping sources impedes precise source localization. This article uses learning-based localization methods to introduce a resilient and intelligent acoustic source tracking approach in the spherical harmonics (SHs) domain. The tracking algorithms anticipate moving source locations by leveraging past predictions and direction of arrival (DOA) estimations. The prediction probability is computed through alpha–beta and Kalman filtering applied to the estimated DOAs, which are likelihood probabilities obtained from learning models. Utilizing the spatial attributes of sound sources encoded in SH signals, diverse learning-based frameworks are introduced to capture the intricate relationship between SH features and source locations. Supervised learning is utilized to train the models that minimize localization errors between predicted and ground truth positions. Experimental assessments underscore the efficacy and resilience of our proposed approach, conducted using LOCAlization and TrAcking (LOCATA) data, revealing a substantial enhancement in tracking accuracy compared to baseline methods.
  • LEARNING-BASED MASKING FOR RELIABLE SOURCE LOCALIZATION INTERFERED BY UNDESIRED DIRECTIONAL NOISE

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi; Gyanajyoti Routray; Siddesh B. Hazare; Rajesh M. Hegde

    Source Title: Forum Acusticum, 10th Convention of the European Acoustics Association,

    View abstract ⏷

    It is incredibly challenging to simultaneously locate an acoustic source in a noisy, reverberant environment and mitigates directional interference. The proposed study uses a spherical harmonic decomposition method to termine the spherical harmonics phase magnitude (SH-PM) components corresponding to the received spherical microphone array (SMA) signals. Before SH-PM components are used as input features to the CNN model, inary masking removes directional interference and emphasizes the desired audio source. In this work, the binary mask is estimated using the learning technique such that it is possible to reliably discriminate between ceptable and undesired sources using real-time mask estimation. The proposed strategy creates a learning-based mask to enable real-time and reliable filtering of the undesirable source. Because of this, the entire strategy is extremely flexible and adaptable. By creating datasets, extensive simulations evaluate the effectiveness of the offered strategy. Additionally, the approach is experimentally validated by conducting tests in a live lab setting. The significance of the suggested strategy promotes the use of the technique in real-world situations.
  • Intelligent sniper localization technique using convolutional neural network

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Reddy Rakesh, Gyanajyoti Routray, Rajesh M. Hegde

    Source Title: Inter-Noise 2023,52nd International Congress and Exposition on Noise Control Engineering,

    View abstract ⏷

    A reliable methodology for sniper localization utilizing a machine-learning framework has been proposed herein. The proposed work is based on the time of arrival of the shock waves (SW) in contrast to the conventional a pproaches, which utilized both the muzzle blast (MB) and SW. Since the MB is susceptive to environmental disturbances, the proposed solution is robust and reliable. The SWs are captured with the 2D non-linear array, and the time delay between the microphones is approximated using the generalized cross-correlation phase transfer (GCC-PHAT). Subsequently, a convolutional neural network (CNN) model is trained to map the input GCC-PHAT features to the sniper position. Adopting the CNN model provides robustness in the method, which also performs better in highly noisy environments. The performance of the proposed technique shows a significant improvement compared to the conventional methods providing a motivation to be used in practical applications.
  • CRNN-based spatial active noise control in spherical harmonics domain

    Dr Gyanajyoti Routray, Gyanajyoti Routray, Siddhesh Bharat Hazare, Priyadarshini Dwivedi, Rajesh M Hegde

    Source Title: Inter-Noise 2023,52nd International Congress and Exposition on Noise Control Engineering,

    View abstract ⏷

    In this work, an active noise control (ANC) in the spherical harmonics (SH) domain, along with a learning model, is developed. The traditional ANCs are built on the adaptive signal processing framework and fail in the presence of nonlinear distortions. Further, these algorithms are limited to the horizontal plane only. In contrast, the proposed work develops the ANC in the 3D space using the SH representation with the spherical microphone array (SMA) as the error microphone. The SH decomposition calculates the SH pressure coefficients from the SMA recordings. Subsequently, a convolutional recurrent neural network (CRNN) is trained to estimate the real and imaginary spectrograms from the SH coefficients as the input features. The output of the CRNN is the cancelling signal that eliminates or attenuates the primary noise in the ANC system. A delay-compensated method addresses ANC system latency. According to simulations and experiments, the proposed learning-based spatial ANC reduces wideband noise and generalizes to untrained noises
  • Diversity Minimization Technique for Multiple Measurement Vector-based Super-resolution spatial Audio Imaging

    Dr Gyanajyoti Routray, Gyanajyoti Routray, Priyadarshini Dwivedi, Rajesh M Hegde

    Source Title: Forum Acusticum, 10th Convention of the European Acoustics Association,

    View abstract ⏷

    Ambisonics is an efficient spatial sound acquisition and reproduction technique in the spherical harmonic domain. At low frequencies, lower-order ambisonics reproduction is accurate, but at high frequencies, the spatial resolution suffers. An increase in frequency shrinks the radius of the error-free region and degrades the spatial resolution. Higher-order ambisonics (HOA) provided better spatial resolution in this context. However, sound spatial acquisition in HOA is constrained by hardware complexity and storage space, in contrast to low-order ambisonics (B-format). So, it is worthwhile to acquire the sound scene at low order to reduce hardware complexity and storage requirement and upscale to a higher order while reproducing to improve the spatial resolution. This work investigated algorithms based on minimizing the diversity measures for obtaining higher-order ambisonics from the B-format signals. In particular, we are interested in the FOCUSS (FOCal Underdetermined System Solver) class of algorithms, which is an alternative and complementary approach to the sequential forward method. Also, a more robust regularized FOCUSS algorithm for the sparse inverse problem is investigated further. The performance of the proposed upscaling method is evaluated using the mean square error metrics. The subjective evaluation is
  • Sniper Localization using Acoustic Signal Processing based on Time of Arrivals

    Dr Gyanajyoti Routray, Reddy Rakesh; Gyanajyoti Routray; Priyadarshini Dwivedi; Rajesh M. Hegde

    Source Title: 2023 National Conference on Communications (NCC),

    View abstract ⏷

    This paper presents a robust sniper localization technique based on the time of arrival of the shock waves, unlike the conventional methods that need both muzzle blast and shock wave. In the proposed work, a two-dimensional array geometry is considered to get rid of solving non-linear equations, as in the case of the linear array. Since the bullet’s velocity is not constant throughout the trajectory, the deceleration parameter is also considered in the proposed work to improve the accuracy compared to conventional methods. The time of arrival of the shock waves are measured from the generalized cross correlation phase transform (GCC-PHAT) between microphone signal recordings. Extensive simulations are carried out for the proposed work and to compare it with the baseline method. The performance of the proposed method is observed to be better than the previous method.
  • Long-Term Temporal Audio Source Localization using SH-CRNN

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi; Siddesh Bharat Hazare; Gyanajyoti Routray; Rajesh M. Hegde

    Source Title: 2023 National Conference on Communications (NCC),

    View abstract ⏷

    Acoustic source localization in a noisy and reverberating environment is still a challenging problem in signal processing. An improved technique has been developed herein exploring the convolutional recurrent neural network (CRNN) in the spherical harmonics domain for the far-field direction of arrival (DOA) estimation. The source signal is recorded using a spherical microphone array (SMA), and the spherical harmonics decomposition (SHD) of the recordings yields the spherical harmonics (SH) pressure coefficients. Subsequently, the SH phase and magnitude coefficients, are calculated. The CRNN model is designed and trained with long-term temporal SH magnitude and phase coefficients across all the frequencies to classify these features corresponding to the source locations. The proposed technique is assessed by extensive simulations and experimental analysis at various the signal-to-noise (SNR) ratio and reverberation time RT60. The root mean square error (RMSE) is evaluated for the proposed DOA estimation technique, and a comparison with the state-of-art methods shows a significant improvement in the localization of the audio source.
  • Learning based method for near field acoustic range estimation in spherical harmonics domain using intensity vectors

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, and Rajesh M. Hegde

    Source Title: Pattern Recognition Letters Volume 165, January 2023, Pages 17-24, Quartile: Q1

    View abstract ⏷

    Near-field acoustic range estimation is considered one of the least explored research problems in digital signal processing under noise and reverberant conditions. This letter develops a new learning-based range estimation technique utilizing the spherical harmonics intensity (SH-INT) coefficients. The conventional range estimation in the spherical harmonics (SH) domain relies on the pressure coefficients. However, at high frequencies, these coefficients of different order and range overlap and hinder the accuracy of range estimation. On the contrary, the SH-INT coefficients are well distinguished at high frequencies for various orders and ranges, making these features favorable for accurate range estimation using learning algorithms. Since the SH-INT coefficients in the radial direction are independent of the source signal and vary with range, a convolutional neural network (CNN) model has been adopted to map the SH-INT coefficients with the range classes. The performance of the proposed spherical harmonic intensity (SH-INT) features in the context of near-field range estimation is validated by conducting exhaustive experiments on simulated and real data. Further, the error in near-field source range estimates is characterized using root mean square error (RMSE) criteria. The results are impactful and encourage the use of this method for practical near-field source range estimation applications.
  • Spatial audio reproduction over ad hoc loudspeaker array using near-field compensation in spherical harmonics domain

    Dr Gyanajyoti Routray, Gyanajyoti Routray and Rajesh M Hegde

    Source Title: Digital Signal Processing Volume 142, October 2023, 104203, Quartile: Q2

    View abstract ⏷

    Spatial audio reproduction using the loudspeaker array introduces the curvature effect leading to a distorted listening experience when the listener is in the near field. In the near-field, the loudspeakers are approximated as point sources (spherical wave) and amplify the mode vectors. Further, the problem becomes more challenging for the irregular loudspeaker arrangement, which causes uneven energy distribution in the reproduction region. In this context, a near-field compensation is applied to the encoded ambisonics coefficients. An optimization problem is formulated, such as the loudspeaker gains encoded with spherical harmonics basis coefficients should match the target ambisonics coefficients. Further, the in-phase and quadrature components of the energy localization vector are imposed as the constraints to direct maximum energy in the reproduction region. The solution to the optimization problem is obtained using a derivative-free optimization solver. The performance of the proposed methods is evaluated for ITU-R recommended loudspeaker layouts using the technical and perceptual evaluation attributes.
  • Spherical harmonics domain-based approach for source localization in presence of directional interference

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, and Rajesh M. Hegde

    Source Title: JASA Express Letters, Quartile: Q!

    View abstract ⏷

    This paper presents a learning-based method for source localization in the presence of directional interference under reverberant and noisy conditions. The proposed method operates on the spherical harmonic decomposition of the spherical microphone array recordings to yield spherical harmonics coefficients as the features. An attention mechanism is incorporated through a binary mask that filters out the dominant undesired source components from the features before training. A convolutional neural network is trained to map the phase and magnitude of the filtered coefficients with the location class. Hence, the objective is to develop the binary mask followed by source localization.
  • Hybrid SH-CNN-MP approach for super resolution DOA estimation

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, Rajesh M Hegde

    Source Title: 2022 56th Asilomar Conference on Signals, Systems, and Computers,

    View abstract ⏷

    A novel framework for super-resolution direction of arrival (DOA) estimation of acoustic sources in the spherical harmonics (SH) domain has been addressed in this work. The proposed method is developed in two stages. First, a convolutional neural network (CNN) model is investigated to obtain the DOA classes from the spherical harmonics decomposition (SHD) of the spherical microphone array (SMA) recordings. Subsequently, the matching pursuit (MP) algorithm with a high-resolution search grid corresponding to the DOA classes is applied to the SHD signals to localize the acoustics source. Since the CNN model performs better in the noisy and reverberant environment and the MP algorithm uses the orthogonal property of the SH basis function to provide high-resolution localization, the proposed hybrid model takes advantage of both these models. Extensive simulations and real-time experiments are performed to validate the performance of the proposed model.
  • Joint doa estimation in spherical harmonics domain using low complexity cnn

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Raj Prakash Gohil, Gyanajyoti Routray, Vishnuvardhan Varanasiy, Rajesh M Hegde

    Source Title: 2022 IEEE International Conference on Signal Processing and Communications (SPCOM),

    View abstract ⏷

    Direction of arrival (DOA) estimation for multi-channel speech enhancement is a challenging problem. In this context, this paper proposes a new method for joint DOA estimation using a low complexity convolutional neural network (CNN) architecture. The spherical harmonic (SH) coefficients of the received speech signal are obtained from the spherical harmonics decomposition (SHD). The magnitude and phase features are extracted from these SH coefficients and combined as a single feature for training the CNN. A single CNN model is trained using these combined features in contrast to two CNN models used in earlier work. Both azimuth and elevation are then obtained for estimation of DOA from this single CNN. Extensive simulations are also conducted for the performance evaluation of the proposed low complexity CNN model. It is observed that the proposed CNN model provides robust DOA estimates at the various signal to noise ratios (SNR) and reverberation times with reduced computational complexity. Performance evaluated in terms of the gross error (GE) and run-time complexity also provides interesting results motivating the use of the proposed model in practical applications.
  • DOA Estimation using Multiclass-SVM in Spherical Harmonics Domain

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi; Gyanajyoti Routray; Rajesh M. Hegde

    Source Title: 2022 IEEE International Conference on Signal Processing and Communications (SPCOM),

    View abstract ⏷

    Direction of arrival (DOA) estimation is still a challenging and fundamental problem in acoustic signal processing. This paper proposes a new method for DOA estimation that utilizes the support vector machine (SVM) based classification. The source signal is recorded by the spherical microphone array (SMA) and decomposed into the spherical harmonics domain. The phase and the magnitude features are calculated from the spherical harmonics (SH) decomposed signals. A multiclass support vector machine (M-SVM) algorithm is implemented to classify these phase and magnitude features to the DOA classes. Since the SVM is a non-probabilistic and deterministic model, it is computationally faster and highly reduced complexity than the neural network-based learning models. Extensive simulations are conducted for the performance evaluation of the proposed method. It is observed that the proposed model provides robust DOA estimates at various signal-to-noise ratios (SNR) and reverberation time. Performance evaluated in terms of the root mean square error (RMSE) provides interesting results motivating the use of the proposed model in practical applications.
  • Upscaling HOA Signals using Order Recursive Matching Pursuit in Spherical Harmonics Domain

    Dr Gyanajyoti Routray, Gyanajyoti Routray, Sumit Kumar Sahu, Rajesh M Hegde

    Source Title: 2022 IEEE International Conference on Signal Processing and Communications (SPCOM),

    View abstract ⏷

    Spatial sound acquisition in Higher-Order Ambisonics (HOA) is constrained by hardware complexity and storage space. In contrast, the low order ambisonics (B-format Signals) suffers from low spatial resolution. So it is worthwhile to acquire the sound at low order to reduce hardware complexity and storage requirement and upscale to a higher order while reproducing to improve the spatial resolution. In this work, a sparse framework is formulated that efficiently uses the Order Recursive Matching Pursuit (ORMP) algorithm for Multiple Measurement Vectors (MMV) to decompose the low-order encoded signal. Subsequently, the upscaled HOA signal is obtained from the decomposed low-order ambisonics to reproduce the spatial audio with high spatial resolution. The performance of the proposed upscaling method is evaluated using the metrics such as a Mean Square Error (MSE) in upscaled signals and error in the reproduced sound field. The subjective evaluation is carried out using a listening test and compared with state-of-art methods.
  • Far-field Source Localization in Spherical Harmonics Domain using Acoustic Intensity Vector

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, Rajesh M Hegde

    Source Title: 24th International Congress on Acoustics (ICA 2022),

    View abstract ⏷

    Source localization in the presence of reverberation and a noisy environment is still a challenging research problem and has various signal processing applications. This paper proposes a novel far-field source localization method using the acoustic intensity vector in the spherical harmonics domain. The mathematical model for the sound pressure captured by the spherical microphone array (SMA) is first developed in the spherical harmonics domain. Subsequently, the acoustic intensity vector is derived from the spherical harmonics decomposition of the pressure and acoustic velocity. As the acoustic velocity efficiently preserves the directional information, the intensity vector also contains directional and energy information. The dependency of location on the intensity vector is further explored. Since the intensity vector in the azimuth and elevation plane varies with the location, a unified convolutional neural network (CNN) model is selected to map the intensity features to the locations in reverberant and noisy conditions. Extensive imulations and experiments are conducted both on simulated and real speech data for evaluating the performance of the proposed localization method. The results show a significant improvement in localization accuracy and mean square error (MSE) compared with the state-of-art methods.
  • Binaural Reproduction of HOA Signal using Temporal Convolutional Networks

    Dr Gyanajyoti Routray, Gyanajyoti Routray, Priyadarshini Dwivedi, Raj Prakash Gohil, Rajesh M Hegde

    Source Title: 24th International Congress on Acoustics (ICA 2022),

    View abstract ⏷

    In this work, a temporal convolutional network (TCN) based binaural reproduction of higher-order ambisonics (HOA) signals in the spherical harmonics (SH) domain is proposed. The binaural rendering is characterized by the head-related transfer function (HRTF). Since the HRTFs cannot be measured for all the directions, it limits error-free binaural reproduction. The proposed work presents a data-driven approach to learning binaural cues from the anthropometric parameter and source directions. The task is to estimate masking functions that transform the higher-order ambisonics (HOA) signals into binaural signals. The learning framework takes the HOA signals as the input along with the anthropometric parameters to generate the binaural signals. In the proposed method, the TCN implicitly learns the HRTFs parameter and produces the binaural signal. The performance of the method is evaluated based on the reproduction accuracy and mean square error (MSE). Further real-time experiments are carried out using the CIPIC HRTF dataset and the binaural recording using the autogenously developed bionic ears to validate the performance of the proposed method.
  • Binaural Source Localization in Median Plane using Learning based Method for Robot Audition

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, Rajesh M Hegde

    Source Title: 24th International Congress on Acoustics (ICA 2022),

    View abstract ⏷

    This article presents a learning-based binaural source localization technique in the median plane and its application to robot audition. Binaural recordings capture the audio signal and acoustic transfer function from the ource to the ears, known as the head-related transfer function (HRTF), which parameterizes spatial cues such as interaural time difference (ITD) and interaural level difference (ILD). ITD and ILD cues are prominent for source localization in the horizontal plane. Since ITD and ILD are nearly equal to zero in the median plane (the ear canal of both the ears is colocated), the localization is complex. Therefore, monaural spectral cues such as spectral notches are investigated for median plane source localization. The spectral notch represents the delay between the direct and the reflected wave. As it varies with the elevation angle, a learning-based model is developed to map the spectral notch with the elevation angle. The spectral notch features are extracted from the binaural recording using linear prediction cepstral coefficients (LPCC) and linear prediction residual oefficients (LPRC). Simulations and experiments are carried out using high-spatial-resolution HRTF measurements from CIPIC dataset to evaluate the performance. The results show a significant improvement in localization accuracy compared with existing methods.
Contact Details

gyanajyoti.r@srmap.edu.in

Scholars
Interests

  • Embedded Systems
  • Internet of Things
  • Wireless Sensor Networks

Education
2012
B.Tech
Biju Patnaik University of Technology
India
2014
M.Tech
Biju Patnaik University of Technology
India
2023
Ph.D
Indian Institute of Technology, Kanpur
Experience
  • June 2023 - Jan 2025 – Project Engineering – India Institute of Technology Kanpur
  • Feb 2010 - July 2016 – Assistant Professor – C. V. Raman College of Engineering, Bhubaneswar ( At present C. V. Raman Global University Bhubaneswar)
  • Dec 2005 - Aug 2008 – O&M Engineering – Ericsson India Pvt. Ltd
Research Interests
  • Acoustic data acquisition and Multi-Channel processing by focusing on technologies related to Microphone arrays, 6DoF Acoustic data for Beamforming, Localization, Tracking, Source enhancement in the challenging Acoustic scenarios
  • Immersive Audio Rendering and Human Perception using lodspeaker and headphone based immersive audio simulation, acoustic reproduction system, and binaural auditory models
Awards & Fellowships
No data available
Memberships
  • IEEE
Publications
  • Sparse Bayesian Integrated CNN Framework for Enhanced Acoustic Source Localization

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi; Gyanajyoti Routray; Rajesh M. Hegde

    Source Title: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),

    View abstract ⏷

    This paper presents a novel framework for super-resolution direction of arrival (DOA) estimation of acoustic sources in the spherical harmonics (SH) domain. The proposed approach combines sparse Bayesian learning (SBL) with a convolutional neural network (CNN). The CNN is utilized to classify DOA based on the spherical harmonics decomposition (SHD) of recordings from a spherical microphone array (SMA), providing coarse DOA estimates. These estimates are then refined using SBL, which operates on a densely sampled grid around the CNN-predicted DOA classes to achieve precise localization. The CNN component exhibits robustness in noisy and reverberant environments, while SBL specializes in high-resolution localization of multiple sparse sources. By leveraging the strengths of both methods, the SH-CNN-SBL framework enhances DOA estimation accuracy in challenging conditions. Extensive simulations and real-world experiments are performed to validate the effectiveness, of the proposed method in achieving a resolution of 1°.
  • Feature Transformation for Fast and Efficient Learning in Near-Field Source Localization

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, Rajesh M Hegde

    Source Title: IEEE Transactions on Artificial Intelligence, Quartile: Q1

    View abstract ⏷

    This work develops an efficient and fast learning method for near-field acoustic source localization using the spherical harmonics (SH) feature transformation. The SH features are derived through the SH decomposition of the microphone array recordings. However, these SH features are often impaired by noise, interference, and reverberation, hindering localization accuracy. In this context, we proposed a feature transformation that leverages the signal subspace of the SH decomposed signals. The feature transformation reduces the irregularities in decomposed SH parameter in the noisy and reverberate condition. Therefore, the proposed subspace based feature captures significant directional and range-dependent cues for localization and enhances the training performance and accuracy with fewer epochs. The efficacy of this fast learning approach is demonstrated using convolutional neural network (CNN) training to map the input features to localization classes. The performance of the proposed approach is evaluated through exhaustive simulation and experiments and compared with the previous methods.
  • Sparsity-driven loudspeaker gain optimization for sound field reconstruction with spherical microphone array

    Dr Gyanajyoti Routray, Gyanajyoti Routray and Rajesh M Hegde

    Source Title: Digital Signal Processing, Volume 154, 104688, 2024. ISSN 1051-2004 \, Quartile: Q2

    View abstract ⏷

    The paper presents a sparsity-driven method utilizing loudspeakers to reconstruct spatial sound fields using measurements obtained from a spherical microphone array (SMA). Employing spherical harmonics decomposition (SHD), the SMA recordings are characterized in the spherical harmonics domain. The gains for the loudspeakers are determined through an optimization problem, equating spherical harmonics pressure coefficients from primary and secondary sources. Furthermore, the sparsity within the loudspeaker feeds is redefined as a constrained sparse optimization problem, integrating linearity and orthogonality constraints. This method effectively reduces the required loudspeakers while maintaining sound field quality. The Bregman iteration method is applied to solve the constrained optimization problem. Rigorous evaluation based on reconstructed sound fields and objective measures highlights significant enhancements compared to least square and compressed sensing methods.
  • Improving Source Tracking Accuracy through Learning-Based Estimation Methods in SH Do- main : A Comparative Study

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, Devansh Kumar Jha, and Rajesh M. Hegde

    Source Title: IEEE Transactions on Artificial Intelligence, vol. 5, no. 8, pp. 3974-3984, Aug. 2024., Quartile: Q1

    View abstract ⏷

    Acoustic source tracking is significant across applications like surveillance, teleconferencing, and robot audition, yet the complexity introduced by reverberation, background noise, and overlapping sources impedes precise source localization. This article uses learning-based localization methods to introduce a resilient and intelligent acoustic source tracking approach in the spherical harmonics (SHs) domain. The tracking algorithms anticipate moving source locations by leveraging past predictions and direction of arrival (DOA) estimations. The prediction probability is computed through alpha–beta and Kalman filtering applied to the estimated DOAs, which are likelihood probabilities obtained from learning models. Utilizing the spatial attributes of sound sources encoded in SH signals, diverse learning-based frameworks are introduced to capture the intricate relationship between SH features and source locations. Supervised learning is utilized to train the models that minimize localization errors between predicted and ground truth positions. Experimental assessments underscore the efficacy and resilience of our proposed approach, conducted using LOCAlization and TrAcking (LOCATA) data, revealing a substantial enhancement in tracking accuracy compared to baseline methods.
  • Octant Spherical Harmo- nics Features for Source Localization using Artificial Intelligence based on Unified Learning Framework

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, and Rajesh M. Hegde

    Source Title: IEEE Transactions on Artificial Intelligence, vol. 5, no. 8, pp. 3845-3857, Aug., Quartile: Q1

    View abstract ⏷

    Acoustic source tracking is significant across applications like surveillance, teleconferencing, and robot audition, yet the complexity introduced by reverberation, background noise, and overlapping sources impedes precise source localization. This article uses learning-based localization methods to introduce a resilient and intelligent acoustic source tracking approach in the spherical harmonics (SHs) domain. The tracking algorithms anticipate moving source locations by leveraging past predictions and direction of arrival (DOA) estimations. The prediction probability is computed through alpha–beta and Kalman filtering applied to the estimated DOAs, which are likelihood probabilities obtained from learning models. Utilizing the spatial attributes of sound sources encoded in SH signals, diverse learning-based frameworks are introduced to capture the intricate relationship between SH features and source locations. Supervised learning is utilized to train the models that minimize localization errors between predicted and ground truth positions. Experimental assessments underscore the efficacy and resilience of our proposed approach, conducted using LOCAlization and TrAcking (LOCATA) data, revealing a substantial enhancement in tracking accuracy compared to baseline methods.
  • LEARNING-BASED MASKING FOR RELIABLE SOURCE LOCALIZATION INTERFERED BY UNDESIRED DIRECTIONAL NOISE

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi; Gyanajyoti Routray; Siddesh B. Hazare; Rajesh M. Hegde

    Source Title: Forum Acusticum, 10th Convention of the European Acoustics Association,

    View abstract ⏷

    It is incredibly challenging to simultaneously locate an acoustic source in a noisy, reverberant environment and mitigates directional interference. The proposed study uses a spherical harmonic decomposition method to termine the spherical harmonics phase magnitude (SH-PM) components corresponding to the received spherical microphone array (SMA) signals. Before SH-PM components are used as input features to the CNN model, inary masking removes directional interference and emphasizes the desired audio source. In this work, the binary mask is estimated using the learning technique such that it is possible to reliably discriminate between ceptable and undesired sources using real-time mask estimation. The proposed strategy creates a learning-based mask to enable real-time and reliable filtering of the undesirable source. Because of this, the entire strategy is extremely flexible and adaptable. By creating datasets, extensive simulations evaluate the effectiveness of the offered strategy. Additionally, the approach is experimentally validated by conducting tests in a live lab setting. The significance of the suggested strategy promotes the use of the technique in real-world situations.
  • Intelligent sniper localization technique using convolutional neural network

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Reddy Rakesh, Gyanajyoti Routray, Rajesh M. Hegde

    Source Title: Inter-Noise 2023,52nd International Congress and Exposition on Noise Control Engineering,

    View abstract ⏷

    A reliable methodology for sniper localization utilizing a machine-learning framework has been proposed herein. The proposed work is based on the time of arrival of the shock waves (SW) in contrast to the conventional a pproaches, which utilized both the muzzle blast (MB) and SW. Since the MB is susceptive to environmental disturbances, the proposed solution is robust and reliable. The SWs are captured with the 2D non-linear array, and the time delay between the microphones is approximated using the generalized cross-correlation phase transfer (GCC-PHAT). Subsequently, a convolutional neural network (CNN) model is trained to map the input GCC-PHAT features to the sniper position. Adopting the CNN model provides robustness in the method, which also performs better in highly noisy environments. The performance of the proposed technique shows a significant improvement compared to the conventional methods providing a motivation to be used in practical applications.
  • CRNN-based spatial active noise control in spherical harmonics domain

    Dr Gyanajyoti Routray, Gyanajyoti Routray, Siddhesh Bharat Hazare, Priyadarshini Dwivedi, Rajesh M Hegde

    Source Title: Inter-Noise 2023,52nd International Congress and Exposition on Noise Control Engineering,

    View abstract ⏷

    In this work, an active noise control (ANC) in the spherical harmonics (SH) domain, along with a learning model, is developed. The traditional ANCs are built on the adaptive signal processing framework and fail in the presence of nonlinear distortions. Further, these algorithms are limited to the horizontal plane only. In contrast, the proposed work develops the ANC in the 3D space using the SH representation with the spherical microphone array (SMA) as the error microphone. The SH decomposition calculates the SH pressure coefficients from the SMA recordings. Subsequently, a convolutional recurrent neural network (CRNN) is trained to estimate the real and imaginary spectrograms from the SH coefficients as the input features. The output of the CRNN is the cancelling signal that eliminates or attenuates the primary noise in the ANC system. A delay-compensated method addresses ANC system latency. According to simulations and experiments, the proposed learning-based spatial ANC reduces wideband noise and generalizes to untrained noises
  • Diversity Minimization Technique for Multiple Measurement Vector-based Super-resolution spatial Audio Imaging

    Dr Gyanajyoti Routray, Gyanajyoti Routray, Priyadarshini Dwivedi, Rajesh M Hegde

    Source Title: Forum Acusticum, 10th Convention of the European Acoustics Association,

    View abstract ⏷

    Ambisonics is an efficient spatial sound acquisition and reproduction technique in the spherical harmonic domain. At low frequencies, lower-order ambisonics reproduction is accurate, but at high frequencies, the spatial resolution suffers. An increase in frequency shrinks the radius of the error-free region and degrades the spatial resolution. Higher-order ambisonics (HOA) provided better spatial resolution in this context. However, sound spatial acquisition in HOA is constrained by hardware complexity and storage space, in contrast to low-order ambisonics (B-format). So, it is worthwhile to acquire the sound scene at low order to reduce hardware complexity and storage requirement and upscale to a higher order while reproducing to improve the spatial resolution. This work investigated algorithms based on minimizing the diversity measures for obtaining higher-order ambisonics from the B-format signals. In particular, we are interested in the FOCUSS (FOCal Underdetermined System Solver) class of algorithms, which is an alternative and complementary approach to the sequential forward method. Also, a more robust regularized FOCUSS algorithm for the sparse inverse problem is investigated further. The performance of the proposed upscaling method is evaluated using the mean square error metrics. The subjective evaluation is
  • Sniper Localization using Acoustic Signal Processing based on Time of Arrivals

    Dr Gyanajyoti Routray, Reddy Rakesh; Gyanajyoti Routray; Priyadarshini Dwivedi; Rajesh M. Hegde

    Source Title: 2023 National Conference on Communications (NCC),

    View abstract ⏷

    This paper presents a robust sniper localization technique based on the time of arrival of the shock waves, unlike the conventional methods that need both muzzle blast and shock wave. In the proposed work, a two-dimensional array geometry is considered to get rid of solving non-linear equations, as in the case of the linear array. Since the bullet’s velocity is not constant throughout the trajectory, the deceleration parameter is also considered in the proposed work to improve the accuracy compared to conventional methods. The time of arrival of the shock waves are measured from the generalized cross correlation phase transform (GCC-PHAT) between microphone signal recordings. Extensive simulations are carried out for the proposed work and to compare it with the baseline method. The performance of the proposed method is observed to be better than the previous method.
  • Long-Term Temporal Audio Source Localization using SH-CRNN

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi; Siddesh Bharat Hazare; Gyanajyoti Routray; Rajesh M. Hegde

    Source Title: 2023 National Conference on Communications (NCC),

    View abstract ⏷

    Acoustic source localization in a noisy and reverberating environment is still a challenging problem in signal processing. An improved technique has been developed herein exploring the convolutional recurrent neural network (CRNN) in the spherical harmonics domain for the far-field direction of arrival (DOA) estimation. The source signal is recorded using a spherical microphone array (SMA), and the spherical harmonics decomposition (SHD) of the recordings yields the spherical harmonics (SH) pressure coefficients. Subsequently, the SH phase and magnitude coefficients, are calculated. The CRNN model is designed and trained with long-term temporal SH magnitude and phase coefficients across all the frequencies to classify these features corresponding to the source locations. The proposed technique is assessed by extensive simulations and experimental analysis at various the signal-to-noise (SNR) ratio and reverberation time RT60. The root mean square error (RMSE) is evaluated for the proposed DOA estimation technique, and a comparison with the state-of-art methods shows a significant improvement in the localization of the audio source.
  • Learning based method for near field acoustic range estimation in spherical harmonics domain using intensity vectors

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, and Rajesh M. Hegde

    Source Title: Pattern Recognition Letters Volume 165, January 2023, Pages 17-24, Quartile: Q1

    View abstract ⏷

    Near-field acoustic range estimation is considered one of the least explored research problems in digital signal processing under noise and reverberant conditions. This letter develops a new learning-based range estimation technique utilizing the spherical harmonics intensity (SH-INT) coefficients. The conventional range estimation in the spherical harmonics (SH) domain relies on the pressure coefficients. However, at high frequencies, these coefficients of different order and range overlap and hinder the accuracy of range estimation. On the contrary, the SH-INT coefficients are well distinguished at high frequencies for various orders and ranges, making these features favorable for accurate range estimation using learning algorithms. Since the SH-INT coefficients in the radial direction are independent of the source signal and vary with range, a convolutional neural network (CNN) model has been adopted to map the SH-INT coefficients with the range classes. The performance of the proposed spherical harmonic intensity (SH-INT) features in the context of near-field range estimation is validated by conducting exhaustive experiments on simulated and real data. Further, the error in near-field source range estimates is characterized using root mean square error (RMSE) criteria. The results are impactful and encourage the use of this method for practical near-field source range estimation applications.
  • Spatial audio reproduction over ad hoc loudspeaker array using near-field compensation in spherical harmonics domain

    Dr Gyanajyoti Routray, Gyanajyoti Routray and Rajesh M Hegde

    Source Title: Digital Signal Processing Volume 142, October 2023, 104203, Quartile: Q2

    View abstract ⏷

    Spatial audio reproduction using the loudspeaker array introduces the curvature effect leading to a distorted listening experience when the listener is in the near field. In the near-field, the loudspeakers are approximated as point sources (spherical wave) and amplify the mode vectors. Further, the problem becomes more challenging for the irregular loudspeaker arrangement, which causes uneven energy distribution in the reproduction region. In this context, a near-field compensation is applied to the encoded ambisonics coefficients. An optimization problem is formulated, such as the loudspeaker gains encoded with spherical harmonics basis coefficients should match the target ambisonics coefficients. Further, the in-phase and quadrature components of the energy localization vector are imposed as the constraints to direct maximum energy in the reproduction region. The solution to the optimization problem is obtained using a derivative-free optimization solver. The performance of the proposed methods is evaluated for ITU-R recommended loudspeaker layouts using the technical and perceptual evaluation attributes.
  • Spherical harmonics domain-based approach for source localization in presence of directional interference

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, and Rajesh M. Hegde

    Source Title: JASA Express Letters, Quartile: Q!

    View abstract ⏷

    This paper presents a learning-based method for source localization in the presence of directional interference under reverberant and noisy conditions. The proposed method operates on the spherical harmonic decomposition of the spherical microphone array recordings to yield spherical harmonics coefficients as the features. An attention mechanism is incorporated through a binary mask that filters out the dominant undesired source components from the features before training. A convolutional neural network is trained to map the phase and magnitude of the filtered coefficients with the location class. Hence, the objective is to develop the binary mask followed by source localization.
  • Hybrid SH-CNN-MP approach for super resolution DOA estimation

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, Rajesh M Hegde

    Source Title: 2022 56th Asilomar Conference on Signals, Systems, and Computers,

    View abstract ⏷

    A novel framework for super-resolution direction of arrival (DOA) estimation of acoustic sources in the spherical harmonics (SH) domain has been addressed in this work. The proposed method is developed in two stages. First, a convolutional neural network (CNN) model is investigated to obtain the DOA classes from the spherical harmonics decomposition (SHD) of the spherical microphone array (SMA) recordings. Subsequently, the matching pursuit (MP) algorithm with a high-resolution search grid corresponding to the DOA classes is applied to the SHD signals to localize the acoustics source. Since the CNN model performs better in the noisy and reverberant environment and the MP algorithm uses the orthogonal property of the SH basis function to provide high-resolution localization, the proposed hybrid model takes advantage of both these models. Extensive simulations and real-time experiments are performed to validate the performance of the proposed model.
  • Joint doa estimation in spherical harmonics domain using low complexity cnn

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Raj Prakash Gohil, Gyanajyoti Routray, Vishnuvardhan Varanasiy, Rajesh M Hegde

    Source Title: 2022 IEEE International Conference on Signal Processing and Communications (SPCOM),

    View abstract ⏷

    Direction of arrival (DOA) estimation for multi-channel speech enhancement is a challenging problem. In this context, this paper proposes a new method for joint DOA estimation using a low complexity convolutional neural network (CNN) architecture. The spherical harmonic (SH) coefficients of the received speech signal are obtained from the spherical harmonics decomposition (SHD). The magnitude and phase features are extracted from these SH coefficients and combined as a single feature for training the CNN. A single CNN model is trained using these combined features in contrast to two CNN models used in earlier work. Both azimuth and elevation are then obtained for estimation of DOA from this single CNN. Extensive simulations are also conducted for the performance evaluation of the proposed low complexity CNN model. It is observed that the proposed CNN model provides robust DOA estimates at the various signal to noise ratios (SNR) and reverberation times with reduced computational complexity. Performance evaluated in terms of the gross error (GE) and run-time complexity also provides interesting results motivating the use of the proposed model in practical applications.
  • DOA Estimation using Multiclass-SVM in Spherical Harmonics Domain

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi; Gyanajyoti Routray; Rajesh M. Hegde

    Source Title: 2022 IEEE International Conference on Signal Processing and Communications (SPCOM),

    View abstract ⏷

    Direction of arrival (DOA) estimation is still a challenging and fundamental problem in acoustic signal processing. This paper proposes a new method for DOA estimation that utilizes the support vector machine (SVM) based classification. The source signal is recorded by the spherical microphone array (SMA) and decomposed into the spherical harmonics domain. The phase and the magnitude features are calculated from the spherical harmonics (SH) decomposed signals. A multiclass support vector machine (M-SVM) algorithm is implemented to classify these phase and magnitude features to the DOA classes. Since the SVM is a non-probabilistic and deterministic model, it is computationally faster and highly reduced complexity than the neural network-based learning models. Extensive simulations are conducted for the performance evaluation of the proposed method. It is observed that the proposed model provides robust DOA estimates at various signal-to-noise ratios (SNR) and reverberation time. Performance evaluated in terms of the root mean square error (RMSE) provides interesting results motivating the use of the proposed model in practical applications.
  • Upscaling HOA Signals using Order Recursive Matching Pursuit in Spherical Harmonics Domain

    Dr Gyanajyoti Routray, Gyanajyoti Routray, Sumit Kumar Sahu, Rajesh M Hegde

    Source Title: 2022 IEEE International Conference on Signal Processing and Communications (SPCOM),

    View abstract ⏷

    Spatial sound acquisition in Higher-Order Ambisonics (HOA) is constrained by hardware complexity and storage space. In contrast, the low order ambisonics (B-format Signals) suffers from low spatial resolution. So it is worthwhile to acquire the sound at low order to reduce hardware complexity and storage requirement and upscale to a higher order while reproducing to improve the spatial resolution. In this work, a sparse framework is formulated that efficiently uses the Order Recursive Matching Pursuit (ORMP) algorithm for Multiple Measurement Vectors (MMV) to decompose the low-order encoded signal. Subsequently, the upscaled HOA signal is obtained from the decomposed low-order ambisonics to reproduce the spatial audio with high spatial resolution. The performance of the proposed upscaling method is evaluated using the metrics such as a Mean Square Error (MSE) in upscaled signals and error in the reproduced sound field. The subjective evaluation is carried out using a listening test and compared with state-of-art methods.
  • Far-field Source Localization in Spherical Harmonics Domain using Acoustic Intensity Vector

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, Rajesh M Hegde

    Source Title: 24th International Congress on Acoustics (ICA 2022),

    View abstract ⏷

    Source localization in the presence of reverberation and a noisy environment is still a challenging research problem and has various signal processing applications. This paper proposes a novel far-field source localization method using the acoustic intensity vector in the spherical harmonics domain. The mathematical model for the sound pressure captured by the spherical microphone array (SMA) is first developed in the spherical harmonics domain. Subsequently, the acoustic intensity vector is derived from the spherical harmonics decomposition of the pressure and acoustic velocity. As the acoustic velocity efficiently preserves the directional information, the intensity vector also contains directional and energy information. The dependency of location on the intensity vector is further explored. Since the intensity vector in the azimuth and elevation plane varies with the location, a unified convolutional neural network (CNN) model is selected to map the intensity features to the locations in reverberant and noisy conditions. Extensive imulations and experiments are conducted both on simulated and real speech data for evaluating the performance of the proposed localization method. The results show a significant improvement in localization accuracy and mean square error (MSE) compared with the state-of-art methods.
  • Binaural Reproduction of HOA Signal using Temporal Convolutional Networks

    Dr Gyanajyoti Routray, Gyanajyoti Routray, Priyadarshini Dwivedi, Raj Prakash Gohil, Rajesh M Hegde

    Source Title: 24th International Congress on Acoustics (ICA 2022),

    View abstract ⏷

    In this work, a temporal convolutional network (TCN) based binaural reproduction of higher-order ambisonics (HOA) signals in the spherical harmonics (SH) domain is proposed. The binaural rendering is characterized by the head-related transfer function (HRTF). Since the HRTFs cannot be measured for all the directions, it limits error-free binaural reproduction. The proposed work presents a data-driven approach to learning binaural cues from the anthropometric parameter and source directions. The task is to estimate masking functions that transform the higher-order ambisonics (HOA) signals into binaural signals. The learning framework takes the HOA signals as the input along with the anthropometric parameters to generate the binaural signals. In the proposed method, the TCN implicitly learns the HRTFs parameter and produces the binaural signal. The performance of the method is evaluated based on the reproduction accuracy and mean square error (MSE). Further real-time experiments are carried out using the CIPIC HRTF dataset and the binaural recording using the autogenously developed bionic ears to validate the performance of the proposed method.
  • Binaural Source Localization in Median Plane using Learning based Method for Robot Audition

    Dr Gyanajyoti Routray, Priyadarshini Dwivedi, Gyanajyoti Routray, Rajesh M Hegde

    Source Title: 24th International Congress on Acoustics (ICA 2022),

    View abstract ⏷

    This article presents a learning-based binaural source localization technique in the median plane and its application to robot audition. Binaural recordings capture the audio signal and acoustic transfer function from the ource to the ears, known as the head-related transfer function (HRTF), which parameterizes spatial cues such as interaural time difference (ITD) and interaural level difference (ILD). ITD and ILD cues are prominent for source localization in the horizontal plane. Since ITD and ILD are nearly equal to zero in the median plane (the ear canal of both the ears is colocated), the localization is complex. Therefore, monaural spectral cues such as spectral notches are investigated for median plane source localization. The spectral notch represents the delay between the direct and the reflected wave. As it varies with the elevation angle, a learning-based model is developed to map the spectral notch with the elevation angle. The spectral notch features are extracted from the binaural recording using linear prediction cepstral coefficients (LPCC) and linear prediction residual oefficients (LPRC). Simulations and experiments are carried out using high-spatial-resolution HRTF measurements from CIPIC dataset to evaluate the performance. The results show a significant improvement in localization accuracy compared with existing methods.
Contact Details

gyanajyoti.r@srmap.edu.in

Scholars