Recognition · Engineering · Preprint — Research Repository

Engineering Preprint PDF DOI

LRS-VoxMM: A benchmark for in-the-wild audio-visual speech recognition

Doyeop Kwak, Jeongsoo Choi, Suyeon Lee, Joon Son Chung · 2026

We introduce LRS-VoxMM, an in-the-wild benchmark for audio-visual speech recognition (AVSR). The benchmark is derived from VoxMM, a dataset of diverse real-world spoken conversations with human-annota…

Read Paper →

Engineering Preprint PDF DOI

SASI: Leveraging Sub-Action Semantics for Robust Early Action Recognition in Human-Robot Interaction

Yongpeng Cao, Masahiro Hirano, Hyuno Kim, Yuji Yamakawa · 2026

Understanding human actions is critical for advancing behavior analysis in human-robot interaction. Particularly in tasks that demand quick and proactive feedback, robots must recognize human actions …

Read Paper →

Engineering Preprint PDF DOI

BUT System Description for CHiME-9 MCoRec Challenge

Dominik Klement, Alexander Polok, Nguyen Hai Phong, Prachi Singh, Lukas Burget · 2026

Multi-talker automatic speech recognition (ASR) in conversational recordings remains an open problem, particularly in scenarios with large portion of overlapping speech where identifying and transcrib…

Read Paper →

Engineering Preprint PDF DOI

FeatureFox: Sample-Efficient Panoptic Graph Segmentation for Machining Feature Recognition in B-Rep 3D-CAD Models

Bertram Fuchs, Altay Kacan, Aaron Haag, Oliver Lohse · 2026

Automatic feature recognition (AFR) on B-Rep 3D-CAD models is central to CAD/CAM automation, yet most learning-based methods are complex, data-hungry, and evaluate instance grouping and semantic label…

Read Paper →

Engineering Preprint PDF DOI

UNet-Based Fusion and Exponential Moving Average Adaptation for Noise-Robust Speaker Recognition

Chong-Xin Gan, Peter Bell, Man-Wai Mak, Zhe Li, Zezhong Jin, Zilong Huang, Kong Aik Lee · 2026

The joint training of speech enhancement and speaker embedding networks for speaker recognition is widely adopted under noisy acoustic environments. While effective, this paradigm often fails to lever…

Read Paper →

Engineering Preprint PDF DOI

Intention-Aware Semantic Agent Communications for AI Glasses

Peiwen Jiang, Fangyu Liu, Jiajia Guo, Chao-Kai Wen, Shi Jin, Jun Zhang · 2026

Smart glasses are emerging as a promising interface between humans and artificial intelligence (AI) agents, enabling first-person perception, contextual awareness, and real-time assistance. However, c…

Read Paper →

Engineering Preprint PDF DOI

Explainable AI in Speaker Recognition -- Making Latent Representations Understandable

Yanze Xu, Wenwu Wang, Mark D. Plumbley · 2026

Neural networks can be trained to learn task-relevant representations from data. Understanding how these networks make decisions falls within the Explainable AI (XAI) domain. This paper proposes to st…

Read Paper →

Engineering Preprint PDF DOI

DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models

Li Li, Ming Cheng, Weixin Zhu, Yannan Wang, Juan Liu, Ming Li · 2026

Multi-speaker automatic speech recognition (ASR) aims to transcribe conversational speech involving multiple speakers, requiring the model to capture not only what was said, but also who said it and s…

Read Paper →

Engineering Preprint PDF DOI

Advancing automatic speech recognition using feature fusion with self-supervised learning features: A case study on Fearless Steps Apollo corpus

Szu-Jui Chen, John H.L. Hansen · 2026

Using self-supervised learning (SSL) models has significantly improved performance for downstream speech tasks, surpassing the capabilities of traditional hand-crafted features. This study investigate…

Read Paper →

Engineering Preprint PDF DOI

From Noise to Intent: Anchoring Generative VLA Policies with Residual Bridges

Yiming Zhong, Yaoyu He, Zemin Yang, Pengfei Tian, Yifan Huang, Qingqiu Huang, Xinge Zhu, Yuexin Ma · 2026

Bridging high-level semantic understanding with low-level physical control remains a persistent challenge in embodied intelligence, stemming from the fundamental spatiotemporal scale mismatch between …

Read Paper →

Engineering Preprint PDF DOI

A Deployable Embodied Vision-Language Navigation System with Hierarchical Cognition and Context-Aware Exploration

Kuan Xu, Ruimeng Liu, Yizhuo Yang, Denan Liang, Tongxing Jin, Shenghai Yuan, Chen Wang, Lihua Xie · 2026

Bridging the gap between embodied intelligence and embedded deployment remains a key challenge in intelligent robotic systems, where perception, reasoning, and planning must operate under strict const…

Read Paper →

Engineering Preprint PDF DOI

Self-Predictive Representation for Autonomous UAV Object-Goal Navigation

Angel Ayala, Donling Sui, Francisco Cruz, Mitchell Torok, Mohammad Deghat, Bruno J. T. Fernandes · 2026

Autonomous Unmanned Aerial Vehicles (UAVs) have revolutionized industries through their versatility with applications including aerial surveillance, search and rescue, agriculture, and delivery. Their…

Read Paper →

Engineering Preprint PDF DOI

Neuro-Symbolic Manipulation Understanding with Enriched Semantic Event Chains

Fatemeh Ziaeetabar · 2026

Robotic systems operating in human environments must reason about how object interactions evolve over time, which actions are currently being performed, and what manipulation step is likely to follow.…

Read Paper →

Engineering Preprint PDF DOI

Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization

Andrei Andrusenko, Vladimir Bataev, Lilit Grigoryan, Nune Tadevosyan, Vitaly Lavrukhin, Boris Ginsburg · 2026

Unification of automatic speech recognition (ASR) systems reduces development and maintenance costs, but training a single model to perform well in both offline and low-latency streaming settings rema…

Read Paper →

Engineering Preprint PDF DOI

NIM4-ASR: Towards Efficient, Robust, and Customizable Real-Time LLM-Based ASR

Yuan Xie, Jiaqi Song, Guang Qiu, Xianliang Wang, Kai Qiao, Junfeng Yuan, Shengqing Liu, Yi Zhang, Bowen Chen, Ming Lei, Jie Gao, Jie Wu · 2026

Integrating large language models (LLMs) into automatic speech recognition (ASR) has become a mainstream paradigm in recent years. Although existing LLM-based ASR models demonstrate impressive perform…

Read Paper →

Engineering Preprint PDF DOI

Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition

Girish, Mohd Mujtaba Akhtar, Muskaan Singh · 2026

In this work, we introduce a paralinguistic supervision paradigm for low-resource multilingual speech emotion recognition (LRM-SER) that leverages non-verbal vocalizations to exploit prosody-centric e…

Read Paper →

Engineering Preprint PDF DOI

BreathAI: Transfer Learning-Based Thermal Imaging for Automated Breathing Pattern Recognition

Hamza Kheddar, Yassine Himeur, Abbes Amira · 2026

This study presents an Adaptive Transfer Learning and Thresholding-based Deep Learning Model (ATL-TDLM) for automated breathing pattern recognition using thermal imaging. Unlike conventional methods t…

Read Paper →

Engineering Preprint PDF DOI

CADRE: Card-Agnostic Domain-Aligned RF Embeddings for Virtual PIN Pads on Passive NFC Cards

Dickson Akuoko Sarpong, Hongzhi Guo · 2026

Near Field Communication (NFC) cards are widely used for identification, but their passive nature often limits the ability to incorporate additional security mechanisms. As a result, anyone holding th…

Read Paper →

Engineering Preprint PDF DOI

Anonymization, Not Elimination: Utility-Preserved Speech Anonymization

Yunchong Xiao, Yuxiang Zhao, Ziyang Ma, Shuai Wang, Kai Yu, Jiachun Liao, Xie Chen · 2026

The growing reliance on large-scale speech data has made privacy protection a critical concern. However, existing anonymization approaches often degrade data utility, for example by disrupting acousti…

Read Paper →

Engineering Preprint PDF DOI

Human Cognition in Machines: A Unified Perspective of World Models

Timothy Rupprecht, Pu Zhao, Amir Taherin, Arash Akbari, Arman Akbari, Yumei He, Sean Duffy, Juyi Lin, Yixiao Chen, Rahul Chowdhury, Enfu Nan, Yixin Shen, Yifan Cao, Haochen Zeng, Weiwei Chen, Geng Yuan, Jennifer Dy, Sarah Ostadabbas, Silvia Zhang, David Kaeli, Edmund Yeh, Yanzhi Wang · 2026

This comprehensive report distinguishes prior works by the cognitive functions they innovate. Many works claim an almost "human-like" cognitive capability in their world models. To evaluate these clai…

Read Paper →

Browse Research Papers

LRS-VoxMM: A benchmark for in-the-wild audio-visual speech recognition

SASI: Leveraging Sub-Action Semantics for Robust Early Action Recognition in Human-Robot Interaction

BUT System Description for CHiME-9 MCoRec Challenge

FeatureFox: Sample-Efficient Panoptic Graph Segmentation for Machining Feature Recognition in B-Rep 3D-CAD Models

UNet-Based Fusion and Exponential Moving Average Adaptation for Noise-Robust Speaker Recognition

Intention-Aware Semantic Agent Communications for AI Glasses

Explainable AI in Speaker Recognition -- Making Latent Representations Understandable

DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models

Advancing automatic speech recognition using feature fusion with self-supervised learning features: A case study on Fearless Steps Apollo corpus

From Noise to Intent: Anchoring Generative VLA Policies with Residual Bridges

A Deployable Embodied Vision-Language Navigation System with Hierarchical Cognition and Context-Aware Exploration

Self-Predictive Representation for Autonomous UAV Object-Goal Navigation

Neuro-Symbolic Manipulation Understanding with Enriched Semantic Event Chains

Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization

NIM4-ASR: Towards Efficient, Robust, and Customizable Real-Time LLM-Based ASR

Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition

BreathAI: Transfer Learning-Based Thermal Imaging for Automated Breathing Pattern Recognition

CADRE: Card-Agnostic Domain-Aligned RF Embeddings for Virtual PIN Pads on Passive NFC Cards

Anonymization, Not Elimination: Utility-Preserved Speech Anonymization

Human Cognition in Machines: A Unified Perspective of World Models

Browse by Category

Research Type

Publish Your Research