David E Speyer in Engineering — Research Repository

Engineering Preprint PDF DOI

BUT System Description for CHiME-9 MCoRec Challenge

Dominik Klement, Alexander Polok, Nguyen Hai Phong, Prachi Singh, Lukas Burget · 2026

Multi-talker automatic speech recognition (ASR) in conversational recordings remains an open problem, particularly in scenarios with large portion of overlapping speech where identifying and transcrib…

Read Paper →

Engineering Preprint PDF DOI

The False Resonance: A Critical Examination of Emotion Embedding Similarity for Speech Generation Evaluation

Yun-Shao Tsai, Yi-Cheng Lin, Huang-Cheng Chou, Tzu-Wen Hsu, Yun-Man Hsu, Chun Wei Chen, Shrikanth Narayanan, Hung-yi Lee · 2026

Objective metrics for emotional expressiveness are vital for speech generation, particularly in expressive synthesis and voice conversion requiring emotional prosody transfer. To quantify this, the fi…

Read Paper →

Engineering Preprint PDF DOI

Dual-LoRA: Parameter-Efficient Adversarial Disentanglement for Cross-Lingual Speaker Verification

Qituan Shangguan, Junhao Du, Kunyang Peng, Feng Xue, Hui Zhang, Xinsheng Wang, Kai Yu, Shuai Wang · 2026

Cross-lingual speaker verification suffers from severe language-speaker entanglement. This causes systematic degradation in the hardest scenario: correctly accepting utterances from the same speaker a…

Read Paper →

Engineering Preprint PDF DOI

DiffAnon: Diffusion-based Prosody Control for Voice Anonymization

Ismail Rasim Ulgen, Zexin Cai, Nicholas Andrews, Philipp Koehn, Berrak Sisman · 2026

To preserve or not to preserve prosody is a central question in voice anonymization. Prosody conveys meaning and affect, yet is tightly coupled with speaker identity. Existing methods either discard p…

Read Paper →

Engineering Preprint PDF DOI

One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

Amanuel Gizachew Abebe, Yasmin Moslem · 2026

Preserving a speaker's voice identity while generating speech in a different language remains a fundamental challenge in spoken language technology, particularly in specialized domains such as scienti…

Read Paper →

Engineering Preprint PDF DOI

UNet-Based Fusion and Exponential Moving Average Adaptation for Noise-Robust Speaker Recognition

Chong-Xin Gan, Peter Bell, Man-Wai Mak, Zhe Li, Zezhong Jin, Zilong Huang, Kong Aik Lee · 2026

The joint training of speech enhancement and speaker embedding networks for speaker recognition is widely adopted under noisy acoustic environments. While effective, this paradigm often fails to lever…

Read Paper →

Engineering Preprint PDF DOI

Robust Accent Identification via Voice Conversion and Non-Timbral Embeddings

Rayane Bakari, Olivier Le Blouch, Nicolas Gengembre, Nicholas Evans · 2026

Automatic accent identification (AID) remains a challenging task due to the complex variability of accents, the entanglement of accent cues with speaker traits, and the scarcity of reliable accentlabe…

Read Paper →

Engineering Preprint PDF DOI

Explainable AI in Speaker Recognition -- Making Latent Representations Understandable

Yanze Xu, Wenwu Wang, Mark D. Plumbley · 2026

Neural networks can be trained to learn task-relevant representations from data. Understanding how these networks make decisions falls within the Explainable AI (XAI) domain. This paper proposes to st…

Read Paper →

Engineering Preprint PDF DOI

DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models

Li Li, Ming Cheng, Weixin Zhu, Yannan Wang, Juan Liu, Ming Li · 2026

Multi-speaker automatic speech recognition (ASR) aims to transcribe conversational speech involving multiple speakers, requiring the model to capture not only what was said, but also who said it and s…

Read Paper →

Engineering Preprint PDF DOI

DiariZen Explained: A Tutorial for the Open Source State-of-the-Art Speaker Diarization Pipeline

Nikhil Raghav · 2026

Speaker diarization (SD) is the task of answering "who spoke when" in a multi-speaker audio stream. Classically, an SD system clusters segments of speech belonging to an individual speaker's identity.…

Read Paper →

Engineering Preprint PDF DOI

CKM Beyond Channel Gain: Spatial Correlation Map Construction with Deep Learning

Z. Chen, S. Fu, Y. Zeng, X. Xu, Z. Wei · 2026

Channel knowledge map (CKM) is a promising technique to achieve environment-aware wireless communication and sensing. Constructing the complete CKM based on channel knowledge observations at sparse lo…

Read Paper →

Engineering Preprint PDF DOI

Algebraic Diversity: Principles of a Group-Theoretic Approach to Signal Processing

Mitchell A. Thornton · 2026

We present principles of algebraic diversity (AD), a group-theoretic approach to signal processing exploiting signal symmetry to extract more information per observation, complementing classical metho…

Read Paper →

Engineering Preprint PDF DOI

Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages

Girish, Mohd Mujtaba Akhtar, Orchid Chetia Phukan, Arun Balaji Buduru · 2026

The rapid advancement of Audio Large Language Models (ALMs), driven by Neural Audio Codecs (NACs), has led to the emergence of highly realistic speech deepfakes, commonly referred to as CodecFakes (CF…

Read Paper →

Engineering Preprint PDF DOI

Odour sensing in turbulent plumes with high-speed electronic nose and non-invasive ground truth

Nik Dennler, Elle Stark, Saimon Collaku, Lars Larson, Andre van Schaik, Michael Schmuker, John Crimaldi, Andreas T. Guntner, Aaron True · 2026

Chemical sensing in real-world environments requires resolving rapidly fluctuating and spatially heterogeneous concentration fields. However, these dynamics are strongly distorted by widely used, low-…

Read Paper →

Engineering Preprint PDF DOI

HALO: Hybrid Auto-encoded Locomotion with Learned Latent Dynamics, Poincar\'e Maps, and Regions of Attraction

Blake Werner, Sergio A. Esteban, Massimiliano De Sa, Max H. Cohen, Aaron D. Ames · 2026

Reduced-order models are powerful for analyzing and controlling high-dimensional dynamical systems. Yet constructing these models for complex hybrid systems such as legged robots remains challenging. …

Read Paper →

Engineering Preprint PDF DOI

Cram\'{e}r-Rao Bound Optimization for Near-Field ISAC with Extended Targets

Zongyao Zhao, Zhaolin Wang, Lincong Han, Liang Xu, Jing Jin, Yuanwei Liu, Kaibin Huang · 2026

Near-field integrated sensing and communication (ISAC) requires target models beyond the point-target abstraction when the target has a non-negligible spatial extent. In this letter, a geometry-aware …

Read Paper →

Engineering Preprint PDF DOI

Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition

Girish, Mohd Mujtaba Akhtar, Muskaan Singh · 2026

In this work, we introduce a paralinguistic supervision paradigm for low-resource multilingual speech emotion recognition (LRM-SER) that leverages non-verbal vocalizations to exploit prosody-centric e…

Read Paper →

Engineering Preprint PDF DOI

HCFD: A Benchmark for Audio Deepfake Detection in Healthcare

Mohd Mujtaba Akhtar, Girish, Muskaan Singh · 2026

In this study, we present Healthcare Codec-Fake Detection (HCFD), a new task for detecting codec-fakes under pathological speech conditions. We intentionally focus on codec based synthetic speech in t…

Read Paper →

Engineering Preprint PDF DOI

Active MIMO Sensing With Exploration-Exploitation Tradeoff

Nadim Ghaddar, Kareem M. Attiah, Wei Yu · 2026

This paper develops an active sensing framework for designing the transmit and receive beamformers of a multiple-input multiple-output (MIMO) radar system. In the proposed technique, the beamformers a…

Read Paper →

Engineering Preprint PDF DOI

Anonymization, Not Elimination: Utility-Preserved Speech Anonymization

Yunchong Xiao, Yuxiang Zhao, Ziyang Ma, Shuai Wang, Kai Yu, Jiachun Liao, Xie Chen · 2026

The growing reliance on large-scale speech data has made privacy protection a critical concern. However, existing anonymization approaches often degrade data utility, for example by disrupting acousti…

Read Paper →

Browse Research Papers

BUT System Description for CHiME-9 MCoRec Challenge

The False Resonance: A Critical Examination of Emotion Embedding Similarity for Speech Generation Evaluation

Dual-LoRA: Parameter-Efficient Adversarial Disentanglement for Cross-Lingual Speaker Verification

DiffAnon: Diffusion-based Prosody Control for Voice Anonymization

One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

UNet-Based Fusion and Exponential Moving Average Adaptation for Noise-Robust Speaker Recognition

Robust Accent Identification via Voice Conversion and Non-Timbral Embeddings

Explainable AI in Speaker Recognition -- Making Latent Representations Understandable

DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models

DiariZen Explained: A Tutorial for the Open Source State-of-the-Art Speaker Diarization Pipeline

CKM Beyond Channel Gain: Spatial Correlation Map Construction with Deep Learning

Algebraic Diversity: Principles of a Group-Theoretic Approach to Signal Processing

Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages

Odour sensing in turbulent plumes with high-speed electronic nose and non-invasive ground truth

HALO: Hybrid Auto-encoded Locomotion with Learned Latent Dynamics, Poincar\'e Maps, and Regions of Attraction

Cram\'{e}r-Rao Bound Optimization for Near-Field ISAC with Extended Targets

Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition

HCFD: A Benchmark for Audio Deepfake Detection in Healthcare

Active MIMO Sensing With Exploration-Exploitation Tradeoff

Anonymization, Not Elimination: Utility-Preserved Speech Anonymization

Browse by Category

Research Type

Publish Your Research