Visual Perception in Engineering — Research Repository

Engineering Preprint PDF DOI

Mask World Model: Predicting What Matters for Robust Robot Policy Learning

Yunfan Lou, Xiaowei Chi, Xiaojie Zhang, Zezhong Qian, Chengxuan Li, Rongyu Zhang, Yaoxu Lyu, Guoyu Song, Chuyao Fu, Haoxuan Xu, Pengwei Wang, Shanghang Zhang · 2026

World models derived from large-scale video generative pre-training have emerged as a promising paradigm for generalist robot policy learning. However, standard approaches often focus on high-fidelity…

Read Paper →

Engineering Preprint PDF DOI

A Gesture-Based Visual Learning Model for Acoustophoretic Interactions using a Swarm of AcoustoBots

Alex Lin, Lei Gao, Narsimlu Kemsaram, Sriram Subramanian · 2026

AcoustoBots are mobile acoustophoretic robots capable of delivering mid-air haptics, directional audio, and acoustic levitation, but existing implementations rely on scripted commands and lack an intu…

Read Paper →

Engineering Preprint PDF DOI

Autonomous UAV Pipeline Near-proximity Inspection via Disturbance-Aware Predictive Visual Servoing

Wen Li, Hui Wang, Jinya Su, Cunjia Liu, Wen-Hua Chen, Shihua Li · 2026

Reliable pipeline inspection is critical to safe energy transportation, but is constrained by long distances, complex terrain, and risks to human inspectors. Unmanned aerial vehicles provide a flexibl…

Read Paper →

Engineering Preprint PDF DOI

GenerativeMPC: VLM-RAG-guided Whole-Body MPC with Virtual Impedance for Bimanual Mobile Manipulation

Marcelino Julio Fernando, Miguel Altamirano Cabrera, Jeffrin Sam, Yara Mahmoud, Konstantin Gubernatorov, Dzmitry Tsetserukou · 2026

Bimanual mobile manipulation requires a seamless integration between high-level semantic reasoning and safe, compliant physical interaction - a challenge that end-to-end models approach opaquely and c…

Read Paper →

Engineering Preprint PDF DOI

Achieving Interaction Fluidity in a Wizard-of-Oz Robotic System: A Prototype for Fluid Error-Correction

Carlos Baptista De Lima, Julian Hough, Frank Forster, Patrick Holthaus, Yongjun Zheng · 2026

Achieving truly fluid interaction with robots with speech interfaces remains a hard problem, and the experience of current Human-Robot Interaction (HRI) remains laboured and frustrating. Some of the b…

Read Paper →

Engineering Preprint PDF DOI

Quadruped Parkour Learning: Sparsely Gated Mixture of Experts with Visual Input

Michael Ziegltrum, Jianhao Jiao, Tianhu Peng, Chengxu Zhou, Dimitrios Kanoulas · 2026

Robotic parkour provides a compelling benchmark for advancing locomotion over highly challenging terrain, including large discontinuities such as elevated steps. Recent approaches have demonstrated im…

Read Paper →

Engineering Preprint PDF DOI

Warmth and Competence in the Swarm: Designing Effective Human-Robot Teams

Genki Miyauchi, Roderich Gro{ss}, Chaona Chen · 2026

As groups of robots increasingly collaborate with humans, understanding how humans perceive them is critical for designing effective human-robot teams. While prior research examined how humans interpr…

Read Paper →

Engineering Preprint PDF DOI

RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation

Feng Jiang, Yang Chen, Kyle Xu, Yuchen Liu, Haifeng Wang, Zhenhao Shen, Jasper Lu, Shengze Huang, Yuanfei Wang, Chen Xie, Ruihai Wu · 2026

Recent advances in large-scale video world models have enabled increasingly realistic future prediction, raising the prospect of leveraging imagined videos for robot learning. However, visual realism …

Read Paper →

Engineering Preprint PDF DOI

AeroBridge-TTA: Test-Time Adaptive Language-Conditioned Control for UAVs

Lingxue Lyu · 2026

Language-guided unmanned aerial vehicles (UAVs) often fail not from bad reasoning or perception, but from execution mismatch: the gap between a planned trajectory and the controller's ability to tra…

Read Paper →

Engineering Preprint PDF DOI

RoomRecon: High-Quality Textured Room Layout Reconstruction on Mobile Devices

Seok Joon Kim, Dinh Duc Cao, Federica Spinola, Se Jin Lee, Kyu Sung Cho · 2026

Widespread RGB-Depth (RGB-D) sensors and advanced 3D reconstruction technologies facilitate the capture of indoor spaces, improving the fields of augmented reality (AR), virtual reality (VR), and exte…

Read Paper →

Engineering Preprint PDF DOI

Inertia Matching Principle: Improving Transient Synchronization Stability in Hybrid Power Systems With VSGs and SGs

Changjun He, Li Zhang, Qi Liu, Rui Zou · 2026

This paper investigates the transient synchronization stability in power systems hybridized with virtual synchronous generators (VSGs) and synchronous generators (SGs). A relative swing equation model…

Read Paper →

Engineering Preprint PDF DOI

AI-Enabled Image-Based Hybrid Vision/Force Control of Tendon-Driven Aerial Continuum Manipulators

Shayan Sepahvand, Farrokh Janabi-Sharifi, Farhad Aghili · 2026

This paper presents an AI-enabled cascaded hybrid vision/force control framework for tendon-driven aerial continuum manipulators based on constant-strain modeling in $SE(3)$ as a coupled system. The p…

Read Paper →

Engineering Preprint PDF DOI

Hybrid SMI Realization via Matrix Completion and Riemannian Manifold Optimization on Narrowband Sub-Array Based Architectures

Tarun Suman Cousik, Rohit Rangaraj, Nishith Tripathi, Jeffrey H Reed, Daniel Jakubisin, Jon Kraft · 2026

Hybrid beamforming architectures reduce hardware complexity but restrict access to full array observations, rendering direct implementation of classical covariance based methods such as minimum varian…

Read Paper →

Engineering Preprint PDF DOI

A Controlled Benchmark of Visual State-Space Backbones with Domain-Shift and Boundary Analysis for Remote-Sensing Segmentation

Nichula Wasalathilaka, Dineth Perera, Oshadha Samarakoon, Buddhi Wijenayake, Roshan Godaliyadda, Vijitha Herath, Parakrama Ekanayake · 2026

Visual state-space models (SSMs) are increasingly promoted as efficient alternatives to Vision Transformers, yet their practical advantages remain unclear under fair comparison because existing studie…

Read Paper →

Engineering Preprint PDF DOI

EmbodiedLGR: Integrating Lightweight Graph Representation and Retrieval for Semantic-Spatial Memory in Robotic Agents

Paolo Riva, Leonardo Gargani, Matteo Frosi, Matteo Matteucci · 2026

As the world of agentic artificial intelligence applied to robotics evolves, the need for agents capable of building and retrieving memories and observations efficiently is increasing. Robots operatin…

Read Paper →

Engineering Preprint PDF DOI

Leader-Follower Formation Control Using Differential Drag and Effective Surface Regulation

Alessio Bocci, Jose Juan Corona-Sanchez, Raymond Kristiansen · 2026

The growing interest in space activities has led to the emergence of new space operators and innovative mission concepts. Small satellites such as CubeSats reduce mission costs and are typically deplo…

Read Paper →

Engineering Preprint PDF DOI

SpaceDex: Generalizable Dexterous Grasping in Tiered Workspaces

Wensheng Wang, Chuanjun Guo, Wei Wei, Tong Wu, Ning Tan · 2026

Generalizable grasping with high-degree-of-freedom (DoF) dexterous hands remains challenging in tiered workspaces, where occlusion, narrow clearances, and height-dependent constraints are substantiall…

Read Paper →

Engineering Preprint PDF DOI

StableIDM: Stabilizing Inverse Dynamics Model against Manipulator Truncation via Spatio-Temporal Refinement

Kerui Li, Zhe Jing, Xiaofeng Wang, Zheng Zhu, Yukun Zhou, Guan Huang, Dongze Li, Qingkai Yang, Huaibo Huang · 2026

Inverse Dynamics Models (IDMs) map visual observations to low-level action commands, serving as central components for data labeling and policy execution in embodied AI. However, their performance deg…

Read Paper →

Engineering Preprint PDF DOI

ST-$\pi$: Structured SpatioTemporal VLA for Robotic Manipulation

Chuanhao Ma, Hanyu Zhou, Shihan Peng, Yan Li, Tao Gu, Luxin Yan · 2026

Vision-language-action (VLA) models have achieved great success on general robotic tasks, but still face challenges in fine-grained spatiotemporal manipulation. Typically, existing methods mainly embe…

Read Paper →

Engineering Preprint PDF DOI

SYMBOLIZER: Symbolic Model-free Task Planning with VLMs

Sami Azirar, Zlatan Ajanovic, Hermann Blum · 2026

Traditional Task and Motion Planning (TAMP) systems depend on physics models for motion planning and discrete symbolic models for task planning. Although physics model are often available, symbolic mo…

Read Paper →

Browse Research Papers

Mask World Model: Predicting What Matters for Robust Robot Policy Learning

A Gesture-Based Visual Learning Model for Acoustophoretic Interactions using a Swarm of AcoustoBots

Autonomous UAV Pipeline Near-proximity Inspection via Disturbance-Aware Predictive Visual Servoing

GenerativeMPC: VLM-RAG-guided Whole-Body MPC with Virtual Impedance for Bimanual Mobile Manipulation

Achieving Interaction Fluidity in a Wizard-of-Oz Robotic System: A Prototype for Fluid Error-Correction

Quadruped Parkour Learning: Sparsely Gated Mixture of Experts with Visual Input

Warmth and Competence in the Swarm: Designing Effective Human-Robot Teams

RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation

AeroBridge-TTA: Test-Time Adaptive Language-Conditioned Control for UAVs

RoomRecon: High-Quality Textured Room Layout Reconstruction on Mobile Devices

Inertia Matching Principle: Improving Transient Synchronization Stability in Hybrid Power Systems With VSGs and SGs

AI-Enabled Image-Based Hybrid Vision/Force Control of Tendon-Driven Aerial Continuum Manipulators

Hybrid SMI Realization via Matrix Completion and Riemannian Manifold Optimization on Narrowband Sub-Array Based Architectures

A Controlled Benchmark of Visual State-Space Backbones with Domain-Shift and Boundary Analysis for Remote-Sensing Segmentation

EmbodiedLGR: Integrating Lightweight Graph Representation and Retrieval for Semantic-Spatial Memory in Robotic Agents

Leader-Follower Formation Control Using Differential Drag and Effective Surface Regulation

SpaceDex: Generalizable Dexterous Grasping in Tiered Workspaces

StableIDM: Stabilizing Inverse Dynamics Model against Manipulator Truncation via Spatio-Temporal Refinement

ST-$\pi$: Structured SpatioTemporal VLA for Robotic Manipulation

SYMBOLIZER: Symbolic Model-free Task Planning with VLMs

Browse by Category

Research Type

Publish Your Research