Xiaolin Huang in Engineering — Research Repository

Engineering Preprint PDF DOI

UniPASE: A Generative Model for Universal Speech Enhancement with High Fidelity and Low Hallucinations

Xiaobin Rong, Zheng Wang, Yushi Wang, Jun Gao, Jing Lu · 2026

Universal speech enhancement (USE) aims to restore speech signals from diverse distortions across multiple sampling rates. We propose UniPASE, an extension of the low-hallucination PASE framework tail…

Read Paper →

Engineering Preprint PDF DOI

GAP-URGENet: A Generative-Predictive Fusion Framework for Universal Speech Enhancement

Xiaobin Rong, Yushi Wang, Zheng Wang, Jing Lu · 2026

We introduce GAP-URGENet, a generative-predictive fusion framework developed for Track 1 of the ICASSP 2026 URGENT Challenge. The system integrates a generative branch, which performs full-stack speec…

Read Paper →

Engineering Preprint PDF DOI

Agent-Driven Autonomous Reinforcement Learning Research: Iterative Policy Improvement for Quadruped Locomotion

Nimesh Khandelwal, Shakti S. Gupta · 2026

This paper documents a case study in agent-driven autonomous reinforcement learning research for quadruped locomotion. The setting was not a fully self-starting research system. A human provided high-…

Read Paper →

Engineering Preprint PDF DOI

ACAVCaps: Enabling large-scale training for fine-grained and diverse audio understanding

Yadong Niu, Tianzi Wang, Heinrich Dinkel, Xingwei Sun, Jiahao Zhou, Gang Li, Jizhong Liu, Junbo Zhang, Jian Luan · 2026

General audio understanding is a fundamental goal for large audio-language models, with audio captioning serving as a cornerstone task for their development. However, progress in this domain is hinder…

Read Paper →

Engineering Preprint PDF DOI

StuPASE: Towards Low-Hallucination Studio-Quality Generative Speech Enhancement

Xiaobin Rong, Jun Gao, Zheng Wang, Mansur Yesilbursa, Kamil Wojcicki, Jing Lu · 2026

Achieving high perceptual quality without hallucination remains a challenge in generative speech enhancement (SE). A representative approach, PASE, is robust to hallucination but has limited perceptua…

Read Paper →

Engineering Preprint PDF DOI

Physics-Informed Anomaly Detection of Terrain Material Change in Radar Imagery

Abdel Hakiem Mohamed Abbas Mohamed Ahmed, Beth Jelfs, Airlie Chapman, Eric Schoof, Christopher Gilliam · 2026

In this paper we consider physics-informed detection of terrain material change in radar imagery (e.g., shifts in permittivity, roughness or moisture). We propose a lightweight electromagnetic (EM) fo…

Read Paper →

Engineering Preprint PDF DOI

Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution

Rui Cai, Jun Guo, Xinze He, Piaopiao Jin, Jie Li, Bingxuan Lin, Futeng Liu, Wei Liu, Fei Ma, Kun Ma, Feng Qiu, Heng Qu, Yifei Su, Qiao Sun, Dong Wang, Donghao Wang, Yunhong Wang, Rujie Wu, Diyun Xiang, Yu Yang, Hangjun Ye, Yuan Zhang, Quanyun Zhou · 2026

In this report, we introduce Xiaomi-Robotics-0, an advanced vision-language-action (VLA) model optimized for high performance and fast and smooth real-time execution. The key to our method lies in a c…

Read Paper →

Engineering Preprint PDF DOI

From Score to Sound: An End-to-End MIDI-to-Motion Pipeline for Robotic Cello Performance

Samantha Sudhoff, Pranesh Velmurugan, Jiashu Liu, Vincent Zhao, Yung-Hsiang Lu, Kristen Yeon-Ji Yun · 2026

Robot musicians require precise control to obtain proper note accuracy, sound quality, and musical expression. Performance of string instruments, such as violin and cello, presents a significant chall…

Read Paper →

Engineering Preprint PDF DOI

MS-PPO: Morphological-Symmetry-Equivariant Policy for Legged Robot Locomotion

Sizhe Wei, Xulin Chen, Fengze Xie, Garrett Ethan Katz, Zhenyu Gan, Lu Gan · 2025

Reinforcement learning has recently enabled impressive locomotion capabilities on legged robots; however, most policy architectures remain morphology- and symmetry-agnostic, leading to inefficient tra…

Read Paper →

Engineering Preprint PDF DOI

One-shot Adaptation of Humanoid Whole-body Motion with Walking Priors

Hao Huang, Geeta Chandra Raju Bethala, Shuaihang Yuan, Congcong Wen, Mengyu Wang, Anthony Tzes, Yi Fang · 2025

Whole-body humanoid motion represents a fundamental challenge in robotics, requiring balance, coordination, and adaptability to enable human-like behaviors. However, existing methods typically require…

Read Paper →

Engineering Preprint PDF DOI

Single-Rod Brachiation Robot: Mechatronic Control Design and Validation of Prejump Phases

Juraj Lieskovsky, Hijiri Akahane, Aoto Osawa, Jaroslav Busek, Ikuo Mizuuchi, Tomas Vyhlidal · 2025

A complete mechatronic design of a minimal configuration brachiation robot is presented. The robot consists of a single rigid rod with gripper mechanisms attached to both ends. The grippers are used t…

Read Paper →

Engineering Preprint PDF DOI

Team Xiaomi EV-AD VLA: Caption-Guided Retrieval System for Cross-Modal Drone Navigation -- Technical Report for IROS 2025 RoboSense Challenge Track 4

Lingfeng Zhang, Erjia Xiao, Yuchen Zhang, Haoxiang Fu, Ruibin Hu, Yanbiao Ma, Wenbo Ding, Long Chen, Hangjun Ye, Xiaoshuai Hao · 2025

Cross-modal drone navigation remains a challenging task in robotics, requiring efficient retrieval of relevant images from large-scale databases based on natural language descriptions. The RoboSense 2…

Read Paper →

Engineering Preprint PDF DOI

MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks

Yadong Niu, Tianzi Wang, Heinrich Dinkel, Xingwei Sun, Jiahao Zhou, Gang Li, Jizhong Liu, Xunying Liu, Junbo Zhang, Jian Luan · 2025

While large audio-language models have advanced open-ended audio understanding, they still fall short of nuanced human-level comprehension. This gap persists largely because current benchmarks, limite…

Read Paper →

Engineering Preprint PDF DOI

Physics-Informed Transfer Learning for Data-Driven Sound Source Reconstruction in Near-Field Acoustic Holography

Xinmeng Luan, Mirco Pezzoli, Fabio Antonacci, Augusto Sarti · 2025

We propose a transfer learning framework for sound source reconstruction in Near-field Acoustic Holography (NAH), which adapts a well-trained data-driven model from one type of sound source to another…

Read Paper →

Engineering Preprint PDF DOI

DIGS: Dynamic CBCT Reconstruction using Deformation-Informed 4D Gaussian Splatting and a Low-Rank Free-Form Deformation Model

Yuliang Huang, Imraj Singh, Thomas Joyce, Kris Thielemans, Jamie R. McClelland · 2025

3D Cone-Beam CT (CBCT) is widely used in radiotherapy but suffers from motion artifacts due to breathing. A common clinical approach mitigates this by sorting projections into respiratory phases and r…

Read Paper →

Engineering Preprint PDF DOI

CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity

Guang Yin, Yitong Li, Yixuan Wang, Dale McConachie, Paarth Shah, Kunimatsu Hashimoto, Huan Zhang, Katherine Liu, Yunzhu Li · 2025

Natural language instructions for robotic manipulation tasks often exhibit ambiguity and vagueness. For instance, the instruction "Hang a mug on the mug tree" may involve multiple valid actions if the…

Read Paper →

Engineering Preprint PDF DOI

Physics-Informed Neural Network-Driven Sparse Field Discretization Method for Near-Field Acoustic Holography

Xinmeng Luan, Mirco Pezzoli, Fabio Antonacci, Augusto Sarti · 2025

We propose the Physics-Informed Neural Network-driven Sparse Field Discretization method (PINN-SFD), a novel self-supervised, physics-informed deep learning approach for addressing the Near-Field Acou…

Read Paper →

Engineering Preprint PDF DOI

UL-UNAS: Ultra-Lightweight U-Nets for Real-Time Speech Enhancement via Network Architecture Search

Xiaobin Rong, Leyan Yang, Dahan Wang, Yuxiang Hu, Changbao Zhu, Kai Chen, Jing Lu · 2025

Lightweight models are essential for real-time speech enhancement applications. In recent years, there has been a growing trend toward developing increasingly compact models for speech enhancement. In…

Read Paper →

Engineering Preprint PDF DOI

Musical Score Following using Statistical Inference

Josephine Cowley · 2025

Musical score following is the real-time mapping of a performance to corresponding locations in a musical score. Score following can be used in a variety of applications including automatic page turni…

Read Paper →

Engineering Preprint PDF DOI

Do We Need iPhone Moment or Xiaomi Moment for Robots? Design of Affordable Home Robots for Health Monitoring

Bo Wei, Yaya Bian, Mingcen Gao · 2024

In this paper, we study cost-effective home robot solutions which are designed for home health monitoring. The recent advancements in Artificial Intelligence (AI) have significantly advanced the capab…

Read Paper →

Browse Research Papers

UniPASE: A Generative Model for Universal Speech Enhancement with High Fidelity and Low Hallucinations

GAP-URGENet: A Generative-Predictive Fusion Framework for Universal Speech Enhancement

Agent-Driven Autonomous Reinforcement Learning Research: Iterative Policy Improvement for Quadruped Locomotion

ACAVCaps: Enabling large-scale training for fine-grained and diverse audio understanding

StuPASE: Towards Low-Hallucination Studio-Quality Generative Speech Enhancement

Physics-Informed Anomaly Detection of Terrain Material Change in Radar Imagery

Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution

From Score to Sound: An End-to-End MIDI-to-Motion Pipeline for Robotic Cello Performance

MS-PPO: Morphological-Symmetry-Equivariant Policy for Legged Robot Locomotion

One-shot Adaptation of Humanoid Whole-body Motion with Walking Priors

Single-Rod Brachiation Robot: Mechatronic Control Design and Validation of Prejump Phases

Team Xiaomi EV-AD VLA: Caption-Guided Retrieval System for Cross-Modal Drone Navigation -- Technical Report for IROS 2025 RoboSense Challenge Track 4

MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks

Physics-Informed Transfer Learning for Data-Driven Sound Source Reconstruction in Near-Field Acoustic Holography

DIGS: Dynamic CBCT Reconstruction using Deformation-Informed 4D Gaussian Splatting and a Low-Rank Free-Form Deformation Model

CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity

Physics-Informed Neural Network-Driven Sparse Field Discretization Method for Near-Field Acoustic Holography

UL-UNAS: Ultra-Lightweight U-Nets for Real-Time Speech Enhancement via Network Architecture Search

Musical Score Following using Statistical Inference

Do We Need iPhone Moment or Xiaomi Moment for Robots? Design of Affordable Home Robots for Health Monitoring

Browse by Category

Research Type

Publish Your Research