Chunyi Zhang in Engineering — Research Repository

Engineering Preprint PDF DOI

DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors

Pengcheng Wang, Kaiwen Hong, Chensheng Peng, Katherine Driggs-Campbell, Masayoshi Tomizuka, Chenfeng Xu, Chen Tang · 2026

Unlike chatbots, physical AI must act while the world keeps evolving. Therefore, the inter-chunk pause of synchronous executors are fatal for dynamic tasks regardless of how fast the inference is. Asy…

Read Paper →

Engineering Preprint PDF DOI

Learning Human-Intention Priors from Large-Scale Human Demonstrations for Robotic Manipulation

Yifan Xie, YuAn Wang, Guangyu Chen, Jinkun Liu, Yu Sun, Wenbo Ding · 2026

Human videos contain rich manipulation priors, but using them for robot learning remains difficult because raw observations entangle scene understanding, human motion, and embodiment-specific action. …

Read Paper →

Engineering Preprint PDF DOI

Tube Diffusion Policy: Reactive Visual-Tactile Policy Learning for Contact-rich Manipulation

Teng Xue, Alberto Rigo, Bingjian Huang, Jiayi Shen, Zhengtong Xu, Nick Colonnese, Amirhossein H. Memar · 2026

Contact-rich manipulation is central to many everyday human activities, requiring continuous adaptation to contact uncertainty and external disturbances through multi-modal perception, particularly vi…

Read Paper →

Engineering Preprint PDF DOI

DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models

Li Li, Ming Cheng, Weixin Zhu, Yannan Wang, Juan Liu, Ming Li · 2026

Multi-speaker automatic speech recognition (ASR) aims to transcribe conversational speech involving multiple speakers, requiring the model to capture not only what was said, but also who said it and s…

Read Paper →

Engineering Preprint PDF DOI

Navigating the Clutter: Waypoint-Based Bi-Level Planning for Multi-Robot Systems

Jiabao Ji, Yongchao Chen, Yang Zhang, Ramana Rao Kompella, Chuchu Fan, Gaowen Liu, Shiyu Chang · 2026

Multi-robot control in cluttered environments is a challenging problem that involves complex physical constraints, including robot-robot collisions, robot-obstacle collisions, and unreachable motions.…

Read Paper →

Engineering Preprint PDF DOI

Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization

Andrei Andrusenko, Vladimir Bataev, Lilit Grigoryan, Nune Tadevosyan, Vitaly Lavrukhin, Boris Ginsburg · 2026

Unification of automatic speech recognition (ASR) systems reduces development and maintenance costs, but training a single model to perform well in both offline and low-latency streaming settings rema…

Read Paper →

Engineering Preprint PDF DOI

Relative State Estimation using Event-Based Propeller Sensing

Ravi Kumar Thakur, Luis Granados Segura, Jan Klivan, Radim Spetlik, Tobias Vinklarek, Matous Vrba, Martin Saska · 2026

Autonomous swarms of multi-Unmanned Aerial Vehicle (UAV) system requires an accurate and fast relative state estimation. Although monocular frame-based camera methods perform well in ideal conditions,…

Read Paper →

Engineering Preprint PDF DOI

ST-$\pi$: Structured SpatioTemporal VLA for Robotic Manipulation

Chuanhao Ma, Hanyu Zhou, Shihan Peng, Yan Li, Tao Gu, Luxin Yan · 2026

Vision-language-action (VLA) models have achieved great success on general robotic tasks, but still face challenges in fine-grained spatiotemporal manipulation. Typically, existing methods mainly embe…

Read Paper →

Engineering Preprint PDF DOI

Rewind-IL: Online Failure Detection and State Respawning for Imitation Learning

Gehan Zheng, Sanjay Seenivasan, Matthew Johnson-Roberson, Weiming Zhi · 2026

Imitation learning has enabled robots to acquire complex visuomotor manipulation skills from demonstrations, but deployment failures remain a major obstacle, especially for long-horizon action-chunked…

Read Paper →

Engineering Preprint PDF DOI

Data-Driven Reachability Analysis Using Matrix Perturbation Theory

Peng Xie, Abdulla Fawzy, Zhen Zhang, Amr Alanwar · 2026

We propose a matrix zonotope perturbation framework that leverages matrix perturbation theory to characterize how noise-induced distortions alter the dynamics within sets of models. The framework deri…

Read Paper →

Engineering Preprint PDF DOI

Learning Versatile Humanoid Manipulation with Touch Dreaming

Yaru Niu, Zhenlong Fang, Binghong Chen, Shuai Zhou, Revanth Krishna Senthilkumaran, Hao Zhang, Bingqing Chen, Chen Qiu, H. Eric Tseng, Jonathan Francis, Ding Zhao · 2026

Humanoid robots promise general-purpose assistance, yet real-world humanoid loco-manipulation remains challenging because it requires whole-body stability, end-effector dexterity, and contact-aware in…

Read Paper →

Engineering Preprint PDF DOI

Whole-Body Mobile Manipulation using Offline Reinforcement Learning on Sub-optimal Controllers

Snehal Jauhri, Vignesh Prasad, Georgia Chalvatzaki · 2026

Mobile Manipulation (MoMa) of articulated objects, such as opening doors, drawers, and cupboards, demands simultaneous, whole-body coordination between a robot's base and arms. Classical whole-body co…

Read Paper →

Engineering Preprint PDF DOI

Second Order Physics-Informed Learning of Road Density using Probe Vehicles

S. Betancur Giraldo, J. M{aa}rtensson, M. Barreau · 2026

We propose a Physics Informed Learning framework for reconstructing traffic density from sparse trajectory data. The approach combines a second-order Aw-Rascle and Zhang model with a first-order train…

Read Paper →

Engineering Preprint PDF DOI

HiPolicy: Hierarchical Multi-Frequency Action Chunking for Policy Learning

Jiyao Zhang, Zimu Han, Junhan Wang, Xionghao Wu, Shihong Lin, Jinzhou Li, Hongwei Fan, Ruihai Wu, Dongjiang Li, Hao Dong · 2026

Robotic imitation learning faces a fundamental trade-off between modeling long-horizon dependencies and enabling fine-grained closed-loop control. Existing fixed-frequency action chunking approaches s…

Read Paper →

Engineering Preprint PDF DOI

Adaptive Action Chunking at Inference-time for Vision-Language-Action Models

Yuanchang Liang, Xiaobo Wang, Kai Wang, Shuo Wang, Xiaojiang Peng, Haoyu Chen, David Kim Huat Chua, Prahlad Vadakkepat · 2026

In Vision-Language-Action (VLA) models, action chunking (i.e., executing a sequence of actions without intermediate replanning) is a key technique to improve robotic manipulation abilities. However, a…

Read Paper →

Engineering Preprint PDF DOI

Open-Loop Planning, Closed-Loop Verification: Speculative Verification for VLA

Zihua Wang, Zhitao Lin, Ruibo Li, Yu Zhang, Xu Yang, Siya Mi, Xiu-Shen Wei · 2026

Vision-Language-Action (VLA) models, as large foundation models for embodied control, have shown strong performance in manipulation tasks. However, their performance comes at high inference cost. To i…

Read Paper →

Engineering Preprint PDF DOI

Posterior Optimization with Clipped Objective for Bridging Efficiency and Stability in Generative Policy Learning

Yuhui Chen, Haoran Li, Zhennan Jiang, Yuxing Qin, Yuxuan Wan, Weiheng Liu, Dongbin Zhao · 2026

Expressive generative models have advanced robotic manipulation by capturing complex, multi-modal action distributions over temporally extended trajectories. However, fine-tuning these policies via RL…

Read Paper →

Engineering Preprint PDF DOI

StreamingVLA: Streaming Vision-Language-Action Model with Action Flow Matching and Adaptive Early Observation

Yiran Shi, Dongqi Guo, Tianchen Zhao, Feng Gao, Liangzhi Shi, Chao Yu, ZhiJian Mo, Qihua Xiao, XiaoShuai Peng, Qingmin Liao, Yu Wang · 2026

Vision-language-action (VLA) models have demonstrated exceptional performance in natural language-driven perception and control. However, the high computational cost of VLA models poses significant ef…

Read Paper →

Engineering Preprint PDF DOI

HiFlow: Tokenization-Free Scale-Wise Autoregressive Policy Learning via Flow Matching

Daichi Yashima, Koki Seno, Shuhei Kurita, Yusuke Oda, Komei Sugiura · 2026

Coarse-to-fine autoregressive modeling has recently shown strong promise for visuomotor policy learning, combining the inference efficiency of autoregressive methods with the global trajectory coheren…

Read Paper →

Engineering Preprint PDF DOI

MMaDA-VLA: Large Diffusion Vision-Language-Action Model with Unified Multi-Modal Instruction and Generation

Yang Liu, Pengxiang Ding, Tengyue Jiang, Xudong Wang, Wenxuan Song, Minghui Lin, Han Zhao, Hongyin Zhang, Zifeng Zhuang, Wei Zhao, Siteng Huang, Jinkui Shi, Donglin Wang · 2026

Vision-Language-Action (VLA) models aim to control robots for manipulation from visual observations and natural-language instructions. However, existing hierarchical and autoregressive paradigms often…

Read Paper →

Browse Research Papers

DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors

Learning Human-Intention Priors from Large-Scale Human Demonstrations for Robotic Manipulation

Tube Diffusion Policy: Reactive Visual-Tactile Policy Learning for Contact-rich Manipulation

DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models

Navigating the Clutter: Waypoint-Based Bi-Level Planning for Multi-Robot Systems

Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization

Relative State Estimation using Event-Based Propeller Sensing

ST-$\pi$: Structured SpatioTemporal VLA for Robotic Manipulation

Rewind-IL: Online Failure Detection and State Respawning for Imitation Learning

Data-Driven Reachability Analysis Using Matrix Perturbation Theory

Learning Versatile Humanoid Manipulation with Touch Dreaming

Whole-Body Mobile Manipulation using Offline Reinforcement Learning on Sub-optimal Controllers

Second Order Physics-Informed Learning of Road Density using Probe Vehicles

HiPolicy: Hierarchical Multi-Frequency Action Chunking for Policy Learning

Adaptive Action Chunking at Inference-time for Vision-Language-Action Models

Open-Loop Planning, Closed-Loop Verification: Speculative Verification for VLA

Posterior Optimization with Clipped Objective for Bridging Efficiency and Stability in Generative Policy Learning

StreamingVLA: Streaming Vision-Language-Action Model with Action Flow Matching and Adaptive Early Observation

HiFlow: Tokenization-Free Scale-Wise Autoregressive Policy Learning via Flow Matching

MMaDA-VLA: Large Diffusion Vision-Language-Action Model with Unified Multi-Modal Instruction and Generation

Browse by Category

Research Type

Publish Your Research