64+ open-access research outputs.
Vision-language models (VLMs) are increasingly being adopted for end-to-end autonomous driving systems due to their exceptional performance in handling long-tail scenarios. However, current VLM-based โฆ
Vision-Language Navigation in Continuous Environments (VLN-CE) requires agents to learn complex reasoning from long-horizon human interactions. While Multi-modal Large Language Models (MLLMs) have driโฆ
Huntington's disease (HD) is caused by CAG-repeat expansion in HTT, which lengthens the polyglutamine (polyQ) tract in huntingtin (HTT) and promotes misfolding and aggregation. While polyQ-length-depeโฆ
Spoofing-robust automatic speaker verification (SASV) seeks to build automatic speaker verification systems that are robust against both zero-effort impostor attacks and sophisticated spoofing techniqโฆ
This work focuses on assessing the information-theoretic limits of scene parameter estimation in plenoptic imaging systems. A general framework to compute lower bounds on the parameter estimation erroโฆ
Forestry plays a vital role in our society, creating significant ecological, economic, and recreational value. Efficient forest management involves labor-intensive and complex operations. One essentiaโฆ
Data scarcity remains a fundamental barrier to achieving fully autonomous surgical robots. While large scale vision language action (VLA) models have shown impressive generalization in household and iโฆ
Planetary rover exploration is attracting renewed interest with several upcoming space missions to the Moon and Mars. However, a substantial amount of data from prior missions remain underutilized forโฆ
We present SAGA, a versatile and adaptive framework for visuomotor control that can generalize across various environments, task objectives, and user specifications. To efficiently learn such capabiliโฆ
This paper proposes a constant false alarm rate (CFAR) target detection algorithm based on the CLEAN concept and truncated statistics to mitigate the non-homogeneity of reference samples caused by sidโฆ
Dexterous manipulation requires precise geometric reasoning, yet existing visuo-tactile learning methods struggle with sub-millimeter precision tasks that are routine for traditional model-based approโฆ
Versatile audio super-resolution (SR) aims to predict high-frequency components from low-resolution audio across diverse domains such as speech, music, and sound effects. Existing diffusion-based SR mโฆ
Voice activity detection (VAD) is essential in speech-based systems, but traditional methods detect only speech presence without identifying speakers. Target-speaker VAD (TS-VAD) extends this by detecโฆ
We explore the performance of several state-of-the-art automatic speech recognition (ASR) models on a large-scale Arabic speech dataset, the SADA (Saudi Audio Dataset for Arabic), which contains 668 hโฆ
Multi-robot localization is a crucial task for implementing multi-robot systems. Numerous researchers have proposed optimization-based multi-robot localization methods that use camera, IMU, and UWB seโฆ
Deep neural networks are increasingly applied in automated histopathology. Yet, whole-slide images (WSIs) are often acquired at gigapixel sizes, rendering them computationally infeasible to analyze enโฆ
Reliable planning is crucial for achieving autonomous driving. Rule-based planners are efficient but lack generalization, while learning-based planners excel in generalization yet have limitations in โฆ
The objective of automatic speaker verification (ASV) systems is to determine whether a given test speech utterance corresponds to a claimed enrolled speaker. These systems have a wide range of applicโฆ
Medical image segmentation plays an important role in various clinical applications; however, existing deep learning models face trade-offs between efficiency and accuracy. Convolutional Neural Networโฆ
Reliable collision avoidance under extreme situations remains a critical challenge for autonomous vehicles. While large language models (LLMs) offer promising reasoning capabilities, their applicationโฆ
Free open-access publishing with Google Scholar indexing.
Submission Guide โ