Steven Creech in Engineering — Research Repository

Engineering Preprint PDF DOI

LRS-VoxMM: A benchmark for in-the-wild audio-visual speech recognition

Doyeop Kwak, Jeongsoo Choi, Suyeon Lee, Joon Son Chung · 2026

We introduce LRS-VoxMM, an in-the-wild benchmark for audio-visual speech recognition (AVSR). The benchmark is derived from VoxMM, a dataset of diverse real-world spoken conversations with human-annota…

Read Paper →

Engineering Preprint PDF DOI

BUT System Description for CHiME-9 MCoRec Challenge

Dominik Klement, Alexander Polok, Nguyen Hai Phong, Prachi Singh, Lukas Burget · 2026

Multi-talker automatic speech recognition (ASR) in conversational recordings remains an open problem, particularly in scenarios with large portion of overlapping speech where identifying and transcrib…

Read Paper →

Engineering Preprint PDF DOI

A Knowledge-Driven Approach to Target Speech Extraction in the Presence of Background Sound Effects for Cinematic Audio Source Separation (CASS)

Chun-wei Ho, Sabato Marco Siniscalchi, Kai Li, Chin-Hui Lee · 2026

We propose a knowledge-driven approach to speech target extraction in the presence of background sound effects already recorded in cinematic audio. The specific knowledge sources studied are manners o…

Read Paper →

Engineering Preprint PDF DOI

Interaction Forces and Internal Loads in Parallel Manipulators with Actuation Redundancy

Joshua Flight, Clement Gosselin · 2026

This paper discusses null-space wrench components in parallel manipulators. We examine the adaptation of the two most common characterizations of these components in grasp-like systems, namely, intera…

Read Paper →

Engineering Preprint PDF DOI

The False Resonance: A Critical Examination of Emotion Embedding Similarity for Speech Generation Evaluation

Yun-Shao Tsai, Yi-Cheng Lin, Huang-Cheng Chou, Tzu-Wen Hsu, Yun-Man Hsu, Chun Wei Chen, Shrikanth Narayanan, Hung-yi Lee · 2026

Objective metrics for emotional expressiveness are vital for speech generation, particularly in expressive synthesis and voice conversion requiring emotional prosody transfer. To quantify this, the fi…

Read Paper →

Engineering Preprint PDF DOI

SPG-Codec: Exploring the Role and Boundaries of Semantic Priors in Ultra-Low-Bitrate Neural Speech Coding

Mingyu Zhao, Zijian Lin, Kun Wei, Zhiyong Wu · 2026

Conventional neural speech codecs suffer from severe intelligibility degradation at ultra-low bitrates, where the bottleneck transitions from acoustic distortion to semantic loss. To address this issu…

Read Paper →

Engineering Preprint PDF DOI

One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

Amanuel Gizachew Abebe, Yasmin Moslem · 2026

Preserving a speaker's voice identity while generating speech in a different language remains a fundamental challenge in spoken language technology, particularly in specialized domains such as scienti…

Read Paper →

Engineering Preprint PDF DOI

Data Driven Calibration of Analytical Concrete Creep Models Considering Preloading Effects Using Gaussian Processes

Leonie Heller, Christopher Taube, Gledson Rodrigo Tondo, Guido Morgenthal · 2026

The time-dependent deformation of concrete, particularly creep, remains a key challenge for reliable and material-efficient design. Experimental results show that tailored preloading, short-term loads…

Read Paper →

Engineering Preprint PDF DOI

UNet-Based Fusion and Exponential Moving Average Adaptation for Noise-Robust Speaker Recognition

Chong-Xin Gan, Peter Bell, Man-Wai Mak, Zhe Li, Zezhong Jin, Zilong Huang, Kong Aik Lee · 2026

The joint training of speech enhancement and speaker embedding networks for speaker recognition is widely adopted under noisy acoustic environments. While effective, this paradigm often fails to lever…

Read Paper →

Engineering Preprint PDF DOI

Robust Accent Identification via Voice Conversion and Non-Timbral Embeddings

Rayane Bakari, Olivier Le Blouch, Nicolas Gengembre, Nicholas Evans · 2026

Automatic accent identification (AID) remains a challenging task due to the complex variability of accents, the entanglement of accent cues with speaker traits, and the scarcity of reliable accentlabe…

Read Paper →

Engineering Preprint PDF DOI

Cross-Linguistic Rhythmic and Spectral Feature-Based Analysis of Nyishi and Adi: Two Under-Resourced Languages of Arunachal Pradesh

Deepshikha Gogoi, Parismita Gogoi, Yang Saring · 2026

Under-resourced languages remain underrepresented in quantitative rhythm research,particularly in systematic intra-branch analysis of acoustic differentiation within closely related linguistic groups.…

Read Paper →

Engineering Preprint PDF DOI

Comparative Evaluation of Modern Deep Learning Methodologies for Portfolio Optimization

Samuel Ozechi, Banjo Francis, Wisdom Yakanu, Joe Wayne Byers · 2026

This study proposes a portfolio optimization framework that integrates advanced deep learning architectures with traditional financial models to enhance risk-adjusted performance. Using historical dat…

Read Paper →

Engineering Preprint PDF DOI

Signal Processing Foundations of Reconfigurable Antennas in the Tri-Hybrid MIMO Architecture

Nitish Vikas Deshpande, Joseph Carlson, Siyun Yang, Mohamed Akrout, Alfredo Gonzalez, Miguel Rodrigo Castellanos, Tharmalingam Ratnarajah, Chan-Byoung Chae, Robert W. Heath Jr · 2026

To enable larger apertures in multipleinput multipleoutput MIMO systems the trihybrid MIMO architecture offers a promising lowcost and lowpower solution by introducing reconfigurable antennas as a thi…

Read Paper →

Engineering Preprint PDF DOI

DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models

Li Li, Ming Cheng, Weixin Zhu, Yannan Wang, Juan Liu, Ming Li · 2026

Multi-speaker automatic speech recognition (ASR) aims to transcribe conversational speech involving multiple speakers, requiring the model to capture not only what was said, but also who said it and s…

Read Paper →

Engineering Preprint PDF DOI

A Kinematic Analysis of Palm Degrees of Freedom for Enhancing Thumb Opposability in Robotic Hands

HyoJae Kang, Yeong Jae Park, Hyunmok Jung, Joonho Lee, Dong Il Park · 2026

This study investigates the kinematic role of palm degrees of freedom (DoF) in enhancing thumb opposability in a five-finger robotic hand. A hand model consisting of a five DoF thumb and four fingers …

Read Paper →

Engineering Preprint PDF DOI

UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions

Chunyu Qiang, Xiaopeng Wang, Kang Yin, Yuzhe Liang, Yuxin Guo, Teng Ma, Ziyu Zhang, Tianrui Wang, Cheng Gong, Yushen Chen, Ruibo Fu, Chen Zhang, Longbiao Wang, Jianwu Dang · 2026

Generative audio modeling has largely been fragmented into specialized tasks, text-to-speech (TTS), text-to-music (TTM), and text-to-audio (TTA), each operating under heterogeneous control paradigms. …

Read Paper →

Engineering Preprint PDF DOI

Advancing automatic speech recognition using feature fusion with self-supervised learning features: A case study on Fearless Steps Apollo corpus

Szu-Jui Chen, John H.L. Hansen · 2026

Using self-supervised learning (SSL) models has significantly improved performance for downstream speech tasks, surpassing the capabilities of traditional hand-crafted features. This study investigate…

Read Paper →

Engineering Preprint PDF DOI

DiariZen Explained: A Tutorial for the Open Source State-of-the-Art Speaker Diarization Pipeline

Nikhil Raghav · 2026

Speaker diarization (SD) is the task of answering "who spoke when" in a multi-speaker audio stream. Classically, an SD system clusters segments of speech belonging to an individual speaker's identity.…

Read Paper →

Engineering Preprint PDF DOI

Full-Duplex Interaction in Spoken Dialogue Systems: A Comprehensive Study from the ICASSP 2026 HumDial Challenge

Chengyou Wang, Hongfei Xue, Guojian Li, Zhixian Zhao, Shuiyuan Wang, Shuai Wang, Xin Xu, Hui Bu, Lei Xie · 2026

Full-duplex interaction, where speakers and listeners converse simultaneously, is a key element of human communication often missing from traditional spoken dialogue systems. These systems, based on r…

Read Paper →

Engineering Preprint PDF DOI

FingerEye: Continuous and Unified Vision-Tactile Sensing for Dexterous Manipulation

Zhixuan Xu, Yichen Li, Xuanye Wu, Tianyu Qiu, Lin Shao · 2026

Dexterous robotic manipulation requires comprehensive perception across all phases of interaction: pre-contact, contact initiation, and post-contact. Such continuous feedback allows a robot to adapt i…

Read Paper →

Browse Research Papers

LRS-VoxMM: A benchmark for in-the-wild audio-visual speech recognition

BUT System Description for CHiME-9 MCoRec Challenge

A Knowledge-Driven Approach to Target Speech Extraction in the Presence of Background Sound Effects for Cinematic Audio Source Separation (CASS)

Interaction Forces and Internal Loads in Parallel Manipulators with Actuation Redundancy

The False Resonance: A Critical Examination of Emotion Embedding Similarity for Speech Generation Evaluation

SPG-Codec: Exploring the Role and Boundaries of Semantic Priors in Ultra-Low-Bitrate Neural Speech Coding

One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

Data Driven Calibration of Analytical Concrete Creep Models Considering Preloading Effects Using Gaussian Processes

UNet-Based Fusion and Exponential Moving Average Adaptation for Noise-Robust Speaker Recognition

Robust Accent Identification via Voice Conversion and Non-Timbral Embeddings

Cross-Linguistic Rhythmic and Spectral Feature-Based Analysis of Nyishi and Adi: Two Under-Resourced Languages of Arunachal Pradesh

Comparative Evaluation of Modern Deep Learning Methodologies for Portfolio Optimization

Signal Processing Foundations of Reconfigurable Antennas in the Tri-Hybrid MIMO Architecture

DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models

A Kinematic Analysis of Palm Degrees of Freedom for Enhancing Thumb Opposability in Robotic Hands

UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions

Advancing automatic speech recognition using feature fusion with self-supervised learning features: A case study on Fearless Steps Apollo corpus

DiariZen Explained: A Tutorial for the Open Source State-of-the-Art Speaker Diarization Pipeline

Full-Duplex Interaction in Spoken Dialogue Systems: A Comprehensive Study from the ICASSP 2026 HumDial Challenge

FingerEye: Continuous and Unified Vision-Tactile Sensing for Dexterous Manipulation

Browse by Category

Research Type

Publish Your Research