Spencer Rogers in Engineering — Research Repository

Engineering Preprint PDF DOI

Flying by Inference: Active Inference World Models for Adaptive UAV Swarms

Kaleem Arshid, Ali Krayani, Lucio Marcenaro, David Martin Gomez, Carlo Regazzoni · 2026

This paper presents an expert-guided active-inference-inspired framework for adaptive UAV swarm trajectory planning. The proposed method converts multi-UAV trajectory design from a repeated combinator…

Read Paper →

Engineering Preprint PDF DOI

LRS-VoxMM: A benchmark for in-the-wild audio-visual speech recognition

Doyeop Kwak, Jeongsoo Choi, Suyeon Lee, Joon Son Chung · 2026

We introduce LRS-VoxMM, an in-the-wild benchmark for audio-visual speech recognition (AVSR). The benchmark is derived from VoxMM, a dataset of diverse real-world spoken conversations with human-annota…

Read Paper →

Engineering Preprint PDF DOI

BUT System Description for CHiME-9 MCoRec Challenge

Dominik Klement, Alexander Polok, Nguyen Hai Phong, Prachi Singh, Lukas Burget · 2026

Multi-talker automatic speech recognition (ASR) in conversational recordings remains an open problem, particularly in scenarios with large portion of overlapping speech where identifying and transcrib…

Read Paper →

Engineering Preprint PDF DOI

The False Resonance: A Critical Examination of Emotion Embedding Similarity for Speech Generation Evaluation

Yun-Shao Tsai, Yi-Cheng Lin, Huang-Cheng Chou, Tzu-Wen Hsu, Yun-Man Hsu, Chun Wei Chen, Shrikanth Narayanan, Hung-yi Lee · 2026

Objective metrics for emotional expressiveness are vital for speech generation, particularly in expressive synthesis and voice conversion requiring emotional prosody transfer. To quantify this, the fi…

Read Paper →

Engineering Preprint PDF DOI

Dual-LoRA: Parameter-Efficient Adversarial Disentanglement for Cross-Lingual Speaker Verification

Qituan Shangguan, Junhao Du, Kunyang Peng, Feng Xue, Hui Zhang, Xinsheng Wang, Kai Yu, Shuai Wang · 2026

Cross-lingual speaker verification suffers from severe language-speaker entanglement. This causes systematic degradation in the hardest scenario: correctly accepting utterances from the same speaker a…

Read Paper →

Engineering Preprint PDF DOI

DiffAnon: Diffusion-based Prosody Control for Voice Anonymization

Ismail Rasim Ulgen, Zexin Cai, Nicholas Andrews, Philipp Koehn, Berrak Sisman · 2026

To preserve or not to preserve prosody is a central question in voice anonymization. Prosody conveys meaning and affect, yet is tightly coupled with speaker identity. Existing methods either discard p…

Read Paper →

Engineering Preprint PDF DOI

One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

Amanuel Gizachew Abebe, Yasmin Moslem · 2026

Preserving a speaker's voice identity while generating speech in a different language remains a fundamental challenge in spoken language technology, particularly in specialized domains such as scienti…

Read Paper →

Engineering Preprint PDF DOI

SlicerRoboTMS: An Open-Source 3D Slicer Extension for Robot-Assisted Transcranial Magnetic Stimulation

Wenzhi Bai, Yituo Guo, Bhaskar Basu, Andrew Weightman, Zhenhong Li · 2026

Robot-assisted Transcranial Magnetic Stimulation (Robo-TMS) is an image-guided robotic intervention that enhances the accuracy and reproducibility of conventional Transcranial Magnetic Stimulation (TM…

Read Paper →

Engineering Preprint PDF DOI

UNet-Based Fusion and Exponential Moving Average Adaptation for Noise-Robust Speaker Recognition

Chong-Xin Gan, Peter Bell, Man-Wai Mak, Zhe Li, Zezhong Jin, Zilong Huang, Kong Aik Lee · 2026

The joint training of speech enhancement and speaker embedding networks for speaker recognition is widely adopted under noisy acoustic environments. While effective, this paradigm often fails to lever…

Read Paper →

Engineering Preprint PDF DOI

Robust Accent Identification via Voice Conversion and Non-Timbral Embeddings

Rayane Bakari, Olivier Le Blouch, Nicolas Gengembre, Nicholas Evans · 2026

Automatic accent identification (AID) remains a challenging task due to the complex variability of accents, the entanglement of accent cues with speaker traits, and the scarcity of reliable accentlabe…

Read Paper →

Engineering Preprint PDF DOI

Reduced-Order Data Assimilation for Thermospheric Density Using Physics-informed SINDyc Models

Sriram Narayanan, Daniele Sicoli, Piyush Mehta · 2026

Accurate estimation of thermospheric mass density is a prerequisite for orbit prediction and space situational awareness, where the upper atmosphere responds nonlinearly to solar and geomagnetic forci…

Read Paper →

Engineering Preprint PDF DOI

PILOT: One Physics-Integrated Generation Framework to Unify 2D and 3D Radio Map Construction

Weiming Huang, Hao Sun, Junting Chen · 2026

Unified 2D and 3D radio map construction supports network planning, wireless digital twins, and unmanned aerial vehicle (UAV) applications. In urban environments, blockage, reflection, and diffraction…

Read Paper →

Engineering Preprint PDF DOI

Explainable AI in Speaker Recognition -- Making Latent Representations Understandable

Yanze Xu, Wenwu Wang, Mark D. Plumbley · 2026

Neural networks can be trained to learn task-relevant representations from data. Understanding how these networks make decisions falls within the Explainable AI (XAI) domain. This paper proposes to st…

Read Paper →

Engineering Preprint PDF DOI

DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models

Li Li, Ming Cheng, Weixin Zhu, Yannan Wang, Juan Liu, Ming Li · 2026

Multi-speaker automatic speech recognition (ASR) aims to transcribe conversational speech involving multiple speakers, requiring the model to capture not only what was said, but also who said it and s…

Read Paper →

Engineering Preprint PDF DOI

LeHome: A Simulation Environment for Deformable Object Manipulation in Household Scenarios

Zeyi Li, Yushi Yang, Shawn Xie, Kyle Xu, Tianxing Chen, Yuran Wang, Zhenhao Shen, Yan Shen, Yue Chen, Wenjun Li, Yukun Zheng, Chaorui Zhang, Siyi Lin, Fei Teng, Hongjun Yang, Ming Chen, Steve Xie, Ruihai Wu · 2026

Household environments present one of the most common, impactful yet challenging application domains for robotics. Within household scenarios, manipulating deformable objects is particularly difficult…

Read Paper →

Engineering Preprint PDF DOI

DiariZen Explained: A Tutorial for the Open Source State-of-the-Art Speaker Diarization Pipeline

Nikhil Raghav · 2026

Speaker diarization (SD) is the task of answering "who spoke when" in a multi-speaker audio stream. Classically, an SD system clusters segments of speech belonging to an individual speaker's identity.…

Read Paper →

Engineering Preprint PDF DOI

Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages

Girish, Mohd Mujtaba Akhtar, Orchid Chetia Phukan, Arun Balaji Buduru · 2026

The rapid advancement of Audio Large Language Models (ALMs), driven by Neural Audio Codecs (NACs), has led to the emergence of highly realistic speech deepfakes, commonly referred to as CodecFakes (CF…

Read Paper →

Engineering Preprint PDF DOI

Simulation of Switching Converters on the Level of Averaged Voltages and Currents

Aleksandra Lekic, Predrag Pejovic · 2026

An algorithm for simulation of switching converters is proposed in the paper. The algorithm is based on simulation of averaged circuit model applying "switching cell" concept, and construction of inst…

Read Paper →

Engineering Preprint PDF DOI

Simulation of Switching Converters on the Level of Averaged Voltages and Currents

Aleksandra Lekic, Predrag Pejovic · 2026

An algorithm for simulation of switching converters is proposed in the paper. The algorithm is based on simulation of averaged circuit model applying "switching cell" concept, and construction of inst…

Read Paper →

Engineering Preprint PDF DOI

Maritime Connectivity Vulnerability Index: Construction, Patterns, and Validation Across 185 Economies, 2006-2025

Mohamed Bouka, Moulaye Abdel Kader Moulaye Ismail · 2026

Recent disruptions at major maritime chokepoints have exposed the structural fragility of liner shipping networks. Existing indicators measure connectivity, but none quantify its structural vulnerabil…

Read Paper →

Browse Research Papers

Flying by Inference: Active Inference World Models for Adaptive UAV Swarms

LRS-VoxMM: A benchmark for in-the-wild audio-visual speech recognition

BUT System Description for CHiME-9 MCoRec Challenge

The False Resonance: A Critical Examination of Emotion Embedding Similarity for Speech Generation Evaluation

Dual-LoRA: Parameter-Efficient Adversarial Disentanglement for Cross-Lingual Speaker Verification

DiffAnon: Diffusion-based Prosody Control for Voice Anonymization

One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

SlicerRoboTMS: An Open-Source 3D Slicer Extension for Robot-Assisted Transcranial Magnetic Stimulation

UNet-Based Fusion and Exponential Moving Average Adaptation for Noise-Robust Speaker Recognition

Robust Accent Identification via Voice Conversion and Non-Timbral Embeddings

Reduced-Order Data Assimilation for Thermospheric Density Using Physics-informed SINDyc Models

PILOT: One Physics-Integrated Generation Framework to Unify 2D and 3D Radio Map Construction

Explainable AI in Speaker Recognition -- Making Latent Representations Understandable

DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models

LeHome: A Simulation Environment for Deformable Object Manipulation in Household Scenarios

DiariZen Explained: A Tutorial for the Open Source State-of-the-Art Speaker Diarization Pipeline

Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages

Simulation of Switching Converters on the Level of Averaged Voltages and Currents

Simulation of Switching Converters on the Level of Averaged Voltages and Currents

Maritime Connectivity Vulnerability Index: Construction, Patterns, and Validation Across 185 Economies, 2006-2025

Browse by Category

Research Type

Publish Your Research