Xiaobin Huang in Engineering — Research Repository

Engineering Preprint PDF DOI

UniPASE: A Generative Model for Universal Speech Enhancement with High Fidelity and Low Hallucinations

Xiaobin Rong, Zheng Wang, Yushi Wang, Jun Gao, Jing Lu · 2026

Universal speech enhancement (USE) aims to restore speech signals from diverse distortions across multiple sampling rates. We propose UniPASE, an extension of the low-hallucination PASE framework tail…

Read Paper →

Engineering Preprint PDF DOI

GAP-URGENet: A Generative-Predictive Fusion Framework for Universal Speech Enhancement

Xiaobin Rong, Yushi Wang, Zheng Wang, Jing Lu · 2026

We introduce GAP-URGENet, a generative-predictive fusion framework developed for Track 1 of the ICASSP 2026 URGENT Challenge. The system integrates a generative branch, which performs full-stack speec…

Read Paper →

Engineering Preprint PDF DOI

Agent-Driven Autonomous Reinforcement Learning Research: Iterative Policy Improvement for Quadruped Locomotion

Nimesh Khandelwal, Shakti S. Gupta · 2026

This paper documents a case study in agent-driven autonomous reinforcement learning research for quadruped locomotion. The setting was not a fully self-starting research system. A human provided high-…

Read Paper →

Engineering Preprint PDF DOI

ACAVCaps: Enabling large-scale training for fine-grained and diverse audio understanding

Yadong Niu, Tianzi Wang, Heinrich Dinkel, Xingwei Sun, Jiahao Zhou, Gang Li, Jizhong Liu, Junbo Zhang, Jian Luan · 2026

General audio understanding is a fundamental goal for large audio-language models, with audio captioning serving as a cornerstone task for their development. However, progress in this domain is hinder…

Read Paper →

Engineering Preprint PDF DOI

StuPASE: Towards Low-Hallucination Studio-Quality Generative Speech Enhancement

Xiaobin Rong, Jun Gao, Zheng Wang, Mansur Yesilbursa, Kamil Wojcicki, Jing Lu · 2026

Achieving high perceptual quality without hallucination remains a challenge in generative speech enhancement (SE). A representative approach, PASE, is robust to hallucination but has limited perceptua…

Read Paper →

Engineering Preprint PDF DOI

Physics-Informed Anomaly Detection of Terrain Material Change in Radar Imagery

Abdel Hakiem Mohamed Abbas Mohamed Ahmed, Beth Jelfs, Airlie Chapman, Eric Schoof, Christopher Gilliam · 2026

In this paper we consider physics-informed detection of terrain material change in radar imagery (e.g., shifts in permittivity, roughness or moisture). We propose a lightweight electromagnetic (EM) fo…

Read Paper →

Engineering Preprint PDF DOI

Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution

Rui Cai, Jun Guo, Xinze He, Piaopiao Jin, Jie Li, Bingxuan Lin, Futeng Liu, Wei Liu, Fei Ma, Kun Ma, Feng Qiu, Heng Qu, Yifei Su, Qiao Sun, Dong Wang, Donghao Wang, Yunhong Wang, Rujie Wu, Diyun Xiang, Yu Yang, Hangjun Ye, Yuan Zhang, Quanyun Zhou · 2026

In this report, we introduce Xiaomi-Robotics-0, an advanced vision-language-action (VLA) model optimized for high performance and fast and smooth real-time execution. The key to our method lies in a c…

Read Paper →

Engineering Preprint PDF DOI

MS-PPO: Morphological-Symmetry-Equivariant Policy for Legged Robot Locomotion

Sizhe Wei, Xulin Chen, Fengze Xie, Garrett Ethan Katz, Zhenyu Gan, Lu Gan · 2025

Reinforcement learning has recently enabled impressive locomotion capabilities on legged robots; however, most policy architectures remain morphology- and symmetry-agnostic, leading to inefficient tra…

Read Paper →

Engineering Preprint PDF DOI

One-shot Adaptation of Humanoid Whole-body Motion with Walking Priors

Hao Huang, Geeta Chandra Raju Bethala, Shuaihang Yuan, Congcong Wen, Mengyu Wang, Anthony Tzes, Yi Fang · 2025

Whole-body humanoid motion represents a fundamental challenge in robotics, requiring balance, coordination, and adaptability to enable human-like behaviors. However, existing methods typically require…

Read Paper →

Engineering Preprint PDF DOI

Single-Rod Brachiation Robot: Mechatronic Control Design and Validation of Prejump Phases

Juraj Lieskovsky, Hijiri Akahane, Aoto Osawa, Jaroslav Busek, Ikuo Mizuuchi, Tomas Vyhlidal · 2025

A complete mechatronic design of a minimal configuration brachiation robot is presented. The robot consists of a single rigid rod with gripper mechanisms attached to both ends. The grippers are used t…

Read Paper →

Engineering Preprint PDF DOI

Team Xiaomi EV-AD VLA: Caption-Guided Retrieval System for Cross-Modal Drone Navigation -- Technical Report for IROS 2025 RoboSense Challenge Track 4

Lingfeng Zhang, Erjia Xiao, Yuchen Zhang, Haoxiang Fu, Ruibin Hu, Yanbiao Ma, Wenbo Ding, Long Chen, Hangjun Ye, Xiaoshuai Hao · 2025

Cross-modal drone navigation remains a challenging task in robotics, requiring efficient retrieval of relevant images from large-scale databases based on natural language descriptions. The RoboSense 2…

Read Paper →

Engineering Preprint PDF DOI

MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks

Yadong Niu, Tianzi Wang, Heinrich Dinkel, Xingwei Sun, Jiahao Zhou, Gang Li, Jizhong Liu, Xunying Liu, Junbo Zhang, Jian Luan · 2025

While large audio-language models have advanced open-ended audio understanding, they still fall short of nuanced human-level comprehension. This gap persists largely because current benchmarks, limite…

Read Paper →

Engineering Preprint PDF DOI

DIGS: Dynamic CBCT Reconstruction using Deformation-Informed 4D Gaussian Splatting and a Low-Rank Free-Form Deformation Model

Yuliang Huang, Imraj Singh, Thomas Joyce, Kris Thielemans, Jamie R. McClelland · 2025

3D Cone-Beam CT (CBCT) is widely used in radiotherapy but suffers from motion artifacts due to breathing. A common clinical approach mitigates this by sorting projections into respiratory phases and r…

Read Paper →

Engineering Preprint PDF DOI

CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity

Guang Yin, Yitong Li, Yixuan Wang, Dale McConachie, Paarth Shah, Kunimatsu Hashimoto, Huan Zhang, Katherine Liu, Yunzhu Li · 2025

Natural language instructions for robotic manipulation tasks often exhibit ambiguity and vagueness. For instance, the instruction "Hang a mug on the mug tree" may involve multiple valid actions if the…

Read Paper →

Engineering Preprint PDF DOI

UL-UNAS: Ultra-Lightweight U-Nets for Real-Time Speech Enhancement via Network Architecture Search

Xiaobin Rong, Leyan Yang, Dahan Wang, Yuxiang Hu, Changbao Zhu, Kai Chen, Jing Lu · 2025

Lightweight models are essential for real-time speech enhancement applications. In recent years, there has been a growing trend toward developing increasingly compact models for speech enhancement. In…

Read Paper →

Engineering Preprint PDF DOI

Do We Need iPhone Moment or Xiaomi Moment for Robots? Design of Affordable Home Robots for Health Monitoring

Bo Wei, Yaya Bian, Mingcen Gao · 2024

In this paper, we study cost-effective home robot solutions which are designed for home health monitoring. The recent advancements in Artificial Intelligence (AI) have significantly advanced the capab…

Read Paper →

Engineering Preprint PDF DOI

ScissorBot: Learning Generalizable Scissor Skill for Paper Cutting via Simulation, Imitation, and Sim2Real

Jiangran Lyu, Yuxing Chen, Tao Du, Feng Zhu, Huiquan Liu, Yizhou Wang, He Wang · 2024

This paper tackles the challenging robotic task of generalizable paper cutting using scissors. In this task, scissors attached to a robot arm are driven to accurately cut curves drawn on the paper, wh…

Read Paper →

Engineering Preprint PDF DOI

Cross-Slice Attention and Evidential Critical Loss for Uncertainty-Aware Prostate Cancer Detection

Alex Ling Yu Hung, Haoxin Zheng, Kai Zhao, Kaifeng Pang, Demetri Terzopoulos, Kyunghyun Sung · 2024

Current deep learning-based models typically analyze medical images in either 2D or 3D albeit disregarding volumetric information or suffering sub-optimal performance due to the anisotropic resolution…

Read Paper →

Engineering Preprint PDF DOI

Autonomous Robot for Disaster Mapping and Victim Localization

Michael Potter, Rahil Bhowal, Richard Zhao, Anuj Patel, Jingming Cheng · 2024

In response to the critical need for effective reconnaissance in disaster scenarios, this research article presents the design and implementation of a complete autonomous robot system using the Turtle…

Read Paper →

Engineering Preprint PDF DOI

Comparison of different methods for identification of dominant oscillation mode

Maja Muftic Dedovic, Samir Avdakovic, Adnan Mujezinovic, Nedis Dautbasic · 2024

This paper introduces and compares the various techniques for identification and analysis of low frequency oscillations in a power system. Inter-area electromechanical oscillations are the focus of th…

Read Paper →

Browse Research Papers

UniPASE: A Generative Model for Universal Speech Enhancement with High Fidelity and Low Hallucinations

GAP-URGENet: A Generative-Predictive Fusion Framework for Universal Speech Enhancement

Agent-Driven Autonomous Reinforcement Learning Research: Iterative Policy Improvement for Quadruped Locomotion

ACAVCaps: Enabling large-scale training for fine-grained and diverse audio understanding

StuPASE: Towards Low-Hallucination Studio-Quality Generative Speech Enhancement

Physics-Informed Anomaly Detection of Terrain Material Change in Radar Imagery

Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution

MS-PPO: Morphological-Symmetry-Equivariant Policy for Legged Robot Locomotion

One-shot Adaptation of Humanoid Whole-body Motion with Walking Priors

Single-Rod Brachiation Robot: Mechatronic Control Design and Validation of Prejump Phases

Team Xiaomi EV-AD VLA: Caption-Guided Retrieval System for Cross-Modal Drone Navigation -- Technical Report for IROS 2025 RoboSense Challenge Track 4

MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks

DIGS: Dynamic CBCT Reconstruction using Deformation-Informed 4D Gaussian Splatting and a Low-Rank Free-Form Deformation Model

CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity

UL-UNAS: Ultra-Lightweight U-Nets for Real-Time Speech Enhancement via Network Architecture Search

Do We Need iPhone Moment or Xiaomi Moment for Robots? Design of Affordable Home Robots for Health Monitoring

ScissorBot: Learning Generalizable Scissor Skill for Paper Cutting via Simulation, Imitation, and Sim2Real

Cross-Slice Attention and Evidential Critical Loss for Uncertainty-Aware Prostate Cancer Detection

Autonomous Robot for Disaster Mapping and Victim Localization

Comparison of different methods for identification of dominant oscillation mode

Browse by Category

Research Type

Publish Your Research