Linghao Zhang in Engineering — Research Repository

Engineering Preprint PDF DOI

Dual-LoRA: Parameter-Efficient Adversarial Disentanglement for Cross-Lingual Speaker Verification

Qituan Shangguan, Junhao Du, Kunyang Peng, Feng Xue, Hui Zhang, Xinsheng Wang, Kai Yu, Shuai Wang · 2026

Cross-lingual speaker verification suffers from severe language-speaker entanglement. This causes systematic degradation in the hardest scenario: correctly accepting utterances from the same speaker a…

Read Paper →

Engineering Preprint PDF DOI

One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

Amanuel Gizachew Abebe, Yasmin Moslem · 2026

Preserving a speaker's voice identity while generating speech in a different language remains a fundamental challenge in spoken language technology, particularly in specialized domains such as scienti…

Read Paper →

Engineering Preprint PDF DOI

Navigating the Clutter: Waypoint-Based Bi-Level Planning for Multi-Robot Systems

Jiabao Ji, Yongchao Chen, Yang Zhang, Ramana Rao Kompella, Chuchu Fan, Gaowen Liu, Shiyu Chang · 2026

Multi-robot control in cluttered environments is a challenging problem that involves complex physical constraints, including robot-robot collisions, robot-obstacle collisions, and unreachable motions.…

Read Paper →

Engineering Preprint PDF DOI

Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition

Girish, Mohd Mujtaba Akhtar, Muskaan Singh · 2026

In this work, we introduce a paralinguistic supervision paradigm for low-resource multilingual speech emotion recognition (LRM-SER) that leverages non-verbal vocalizations to exploit prosody-centric e…

Read Paper →

Engineering Preprint PDF DOI

Data-Driven Reachability Analysis Using Matrix Perturbation Theory

Peng Xie, Abdulla Fawzy, Zhen Zhang, Amr Alanwar · 2026

We propose a matrix zonotope perturbation framework that leverages matrix perturbation theory to characterize how noise-induced distortions alter the dynamics within sets of models. The framework deri…

Read Paper →

Engineering Preprint PDF DOI

X-VC: Zero-shot Streaming Voice Conversion in Codec Space

Qixi Zheng, Yuxiang Zhao, Tianrui Wang, Wenxi Chen, Kele Xu, Yikang Li, Qinyuan Chen, Xipeng Qiu, Kai Yu, Xie Chen · 2026

Zero-shot voice conversion (VC) aims to convert a source utterance into the voice of an unseen target speaker while preserving its linguistic content. Although recent systems have improved conversion …

Read Paper →

Engineering Preprint PDF DOI

Second Order Physics-Informed Learning of Road Density using Probe Vehicles

S. Betancur Giraldo, J. M{aa}rtensson, M. Barreau · 2026

We propose a Physics Informed Learning framework for reconstructing traffic density from sparse trajectory data. The approach combines a second-order Aw-Rascle and Zhang model with a first-order train…

Read Paper →

Engineering Preprint PDF DOI

Advancing LLM-based phoneme-to-grapheme for multilingual speech recognition

Lukuang Dong, Ziwei Li, Saierdaer Yusuyin, Xianyu Zhao, Zhijian Ou · 2026

Phoneme-based ASR factorizes recognition into speech-to-phoneme (S2P) and phoneme-to-grapheme (P2G), enabling cross-lingual acoustic sharing while keeping language-specific orthography in a separate m…

Read Paper →

Engineering Preprint PDF DOI

E-SocialNav: Efficient Socially Compliant Navigation with Language Models

Ling Xiao, Daeun Song, Xuesu Xiao, Toshihiko Yamasaki · 2026

Language models (LMs) are increasingly applied to robotic navigation; however, existing benchmarks primarily emphasize navigation success rates while paying limited attention to social compliance. Mor…

Read Paper →

Engineering Preprint PDF DOI

Quantifying Cross-Lingual Transfer in Paralinguistic Speech Tasks

Pol Buitrago, Oriol Pareras, Federico Costa, Javier Hernando · 2026

Paralinguistic speech tasks are often considered relatively language-agnostic, as they rely on extralinguistic acoustic cues rather than lexical content. However, prior studies report performance degr…

Read Paper →

Engineering Preprint PDF DOI

Language-Invariant Multilingual Speaker Verification for the TidyVoice 2026 Challenge

Ze Li, Xiaoxiao Miao, Juan Liu, Ming Li · 2026

Multilingual speaker verification (SV) remains challenging due to limited cross-lingual data and language-dependent information in speaker embeddings. This paper presents a language-invariant multilin…

Read Paper →

Engineering Preprint PDF DOI

STRIDE: Post-Training LLMs to Reason and Refine Bio-Sequences via Edit Trajectories

Daiheng Zhang, Shiyang Zhang, Sizhuang He, Yangtian Zhang, Syed Asad Rizvi, David van Dijk · 2026

Discrete biological sequence optimization requires iterative refinement under strict syntactic constraints. Diffusion models offer progressive refinement but do not naturally expose controllable discr…

Read Paper →

Engineering Preprint PDF DOI

The PARLO Dementia Corpus: A German Multi-Center Resource for Alzheimer's Disease

Franziska Braun, Christopher Witzl, Florian Honig, Elmar Noth, Tobias Bocklet, Korbinian Riedhammer · 2026

Early and accessible detection of Alzheimer's disease (AD) remains a major challenge, as current diagnostic methods often rely on costly and invasive biomarkers. Speech and language analysis has emerg…

Read Paper →

Engineering Preprint PDF DOI

TidyVoice 2026 Challenge Evaluation Plan

Aref Farhadipour, Jan Marquenie, Srikanth Madikeri, Teodora Vukovic, Volker Dellwo, Kathy Reid, Francis M. Tyers, Ingo Siegert, Eleanor Chodroff · 2026

The performance of speaker verification systems degrades significantly under language mismatch, a critical challenge exacerbated by the field's reliance on English-centric data. To address this, we pr…

Read Paper →

Engineering Preprint PDF DOI

Towards Language-Independent Face-Voice Association with Multimodal Foundation Models

Aref Farhadipour, Teodora Vukovic, Volker Dellwo · 2025

This paper describes the UZH-CL system submitted to the FAME2026 Challenge. The challenge focuses on cross-modal verification under unique multilingual conditions, specifically unseen and unheard lang…

Read Paper →

Engineering Preprint PDF DOI

TokCom-UEP: Semantic Importance-Matched Unequal Error Protection for Resilient Image Transmission

Kaizheng Zhang, Zuolin Jin, Zhihang Cheng, Ming Zeng, Li Qiao, Zesong Fei · 2025

Based on the provided LaTeX code, here is the metadata for the submission form: Title: TokCom-UEP: Semantic Importance-Matched Unequal Error Protection for Resilient Image Transmission Author(s): Kaiz…

Read Paper →

Engineering Preprint PDF DOI

SyncVoice: Towards Video Dubbing with Vision-Augmented Pretrained TTS Model

Kaidi Wang, Yi He, Wenhao Guan, Weijie Wu, Hongwu Ding, Xiong Zhang, Di Wu, Meng Meng, Jian Luan, Lin Li, Qingyang Hong · 2025

Video dubbing aims to generate high-fidelity speech that is precisely temporally aligned with the visual content. Existing methods still suffer from limitations in speech naturalness and audio-visual …

Read Paper →

Engineering Preprint PDF DOI

TTA: Transcribe, Translate and Alignment for Cross-lingual Speech Representation

Wei Liu, Jiahong Li, Yiwen Shao, Dong Yu · 2025

Speech-LLM models have demonstrated great performance in multi-modal and multi-task speech understanding. A typical speech-LLM paradigm is integrating speech modality with a large language model (LLM)…

Read Paper →

Engineering Preprint PDF DOI

Event-Triggered Regulation of Mixed-Autonomy Traffic Under Varying Traffic Conditions

Yihuai Zhang, Huan Yu · 2025

Modeling and congestion mitigation of mixed-autonomy traffic systems consisting of human-driven vehicles (HVs) and autonomous vehicles (AVs) have become increasingly critical with the rapid developmen…

Read Paper →

Engineering Preprint PDF DOI

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing

Zhisheng Zheng, Puyuan Peng, Anuj Diwan, Cong Phuoc Huynh, Xiaohang Sun, Zhu Liu, Vimal Bhat, David Harwath · 2025

We introduce VoiceCraft-X, an autoregressive neural codec language model which unifies multilingual speech editing and zero-shot Text-to-Speech (TTS) synthesis across 11 languages: English, Mandarin, …

Read Paper →

Browse Research Papers

Dual-LoRA: Parameter-Efficient Adversarial Disentanglement for Cross-Lingual Speaker Verification

One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

Navigating the Clutter: Waypoint-Based Bi-Level Planning for Multi-Robot Systems

Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition

Data-Driven Reachability Analysis Using Matrix Perturbation Theory

X-VC: Zero-shot Streaming Voice Conversion in Codec Space

Second Order Physics-Informed Learning of Road Density using Probe Vehicles

Advancing LLM-based phoneme-to-grapheme for multilingual speech recognition

E-SocialNav: Efficient Socially Compliant Navigation with Language Models

Quantifying Cross-Lingual Transfer in Paralinguistic Speech Tasks

Language-Invariant Multilingual Speaker Verification for the TidyVoice 2026 Challenge

STRIDE: Post-Training LLMs to Reason and Refine Bio-Sequences via Edit Trajectories

The PARLO Dementia Corpus: A German Multi-Center Resource for Alzheimer's Disease

TidyVoice 2026 Challenge Evaluation Plan

Towards Language-Independent Face-Voice Association with Multimodal Foundation Models

TokCom-UEP: Semantic Importance-Matched Unequal Error Protection for Resilient Image Transmission

SyncVoice: Towards Video Dubbing with Vision-Augmented Pretrained TTS Model

TTA: Transcribe, Translate and Alignment for Cross-lingual Speech Representation

Event-Triggered Regulation of Mixed-Autonomy Traffic Under Varying Traffic Conditions

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing

Browse by Category

Research Type

Publish Your Research