Visual Perception in Engineering — Research Repository

Engineering Preprint PDF DOI

NeuroMesh: A Unified Neural Inference Framework for Decentralized Multi-Robot Collaboration

Yang Zhou, Yash Shetye, Long Quang, Devon Super, Jesse Milzman, Manohari Goarin, Aditya Azad, Devang Sunil Dhake, Jeffery Mao, Carlos Nieto-Granda, Giuseppe Loianno · 2026

Deploying learned multi-robot models on heterogeneous robots remains challenging due to hardware heterogeneity, communication constraints, and the lack of a unified execution stack. This paper present…

Read Paper →

Engineering Preprint PDF DOI

Eccentricity Confound in EEG-based Visual Attention Decoding from Gaze-Fixated Neural Tracking of Motion in Natural Videos

Yuanyuan Yao, Celina Salamanca Gonzalez, Simon Geirnaert, Celine R. Gillebert, Tinne Tuytelaars, Alexander Bertrand · 2026

Objective. Decoding visual attention from brain signals during naturalistic video viewing has emerged as a new direction in brain-computer interface research. Current methods assume that stronger coup…

Read Paper →

Engineering Preprint PDF DOI

Dual Pose-Graph Semantic Localization for Vision-Based Autonomous Drone Racing

David Perez-Saura, Miguel Fernandez-Cortizas, Alvaro J. Gaona, Pascual Campoy · 2026

Autonomous drone racing demands robust real-time localization under extreme conditions: high-speed flight, aggressive maneuvers, and payload-constrained platforms that often rely on a single camera fo…

Read Paper →

Engineering Preprint PDF DOI

Trajectory Planning for a Multi-UAV Rigid-Payload Cascaded Transportation System Based on Enhanced Tube-RRT*

Jianqiao Yu, Jia Li, Tianhua Gao · 2026

This paper presents a two-stage trajectory planning framework for a multi-UAV rigid-payload cascaded transportation system, aiming to address planning challenges in densely cluttered environments. In …

Read Paper →

Engineering Preprint PDF DOI

CAVERS: Multimodal SLAM Data from a Natural Karstic Cave with Ground Truth Motion Capture

Giacomo Franchini, David Rodriguez-Martinez, Alfonso Martinez-Petersen, C. J. Perez-del-Pulgar, Marcello Chiaberge · 2026

Autonomous robots operating in natural karstic caves face perception and navigation challenges that are qualitatively distinct from those encountered in mines or tunnels: irregular geometry, reflectiv…

Read Paper →

Engineering Preprint PDF DOI

DockAnywhere: Data-Efficient Visuomotor Policy Learning for Mobile Manipulation via Novel Demonstration Generation

Ziyu Shan, Yuheng Zhou, Gaoyuan Wu, Ziheng Ji, Zhenyu Wu, Ziwei Wang · 2026

Mobile manipulation is a fundamental capability that enables robots to interact in expansive environments such as homes and factories. Most existing approaches follow a two-stage paradigm, where the r…

Read Paper →

Engineering Preprint PDF DOI

Momentum-constrained Hybrid Heuristic Trajectory Optimization Framework with Residual-enhanced DRL for Visually Impaired Scenarios

Yuting Zeng, Zhiwen Zheng, Jingya Wang, You Zhou, JiaLing Xiao, Yongbin Yu, Manping Fan, Bo Gong, Liyong Ren · 2026

Safe and efficient assistive planning for visually impaired scenarios remains challenging, since existing methods struggle with multi-objective optimization, generalization, and interpretability. In r…

Read Paper →

Engineering Preprint PDF DOI

POMDP-based Object Search with Growing State Space and Hybrid Action Domain

Yongbo Chen, Hesheng Wang, Shoudong Huang, Hanna Kurniawati · 2026

Efficiently locating target objects in complex indoor environments with diverse furniture, such as shelves, tables, and beds, is a significant challenge for mobile robots. This difficulty arises from …

Read Paper →

Engineering Preprint PDF DOI

HRDexDB: A Large-Scale Dataset of Dexterous Human and Robotic Hand Grasps

Jongbin Lim, Taeyun Ha, Mingi Choi, Jisoo Kim, Byungjun Kim, Subin Jeon, Hanbyul Joo · 2026

We present HRDexDB, a large-scale, multi-modal dataset of high-fidelity dexterous grasping sequences featuring both human and diverse robotic hands. Unlike existing datasets, HRDexDB provides a compre…

Read Paper →

Engineering Preprint PDF DOI

Keep It CALM: Toward Calibration-Free Kilometer-Level SLAM with Visual Geometry Foundation Models via an Assistant Eye

Tianjun Zhang, Fengyi Zhang, Tianchen Deng, Lin Zhang, Hesheng Wang · 2026

Visual Geometry Foundation Models (VGFMs) demonstrate remarkable zero-shot capabilities in local reconstruction. However, deploying them for kilometer-level Simultaneous Localization and Mapping (SLAM…

Read Paper →

Engineering Preprint PDF DOI

World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems

Runze Li, Hongyin Zhang, Junxi Jin, Qixin Zeng, Zifeng Zhuang, Yiqi Tang, Shangke Lyu, Donglin Wang · 2026

Vision-Language-Action (VLA) models have emerged as a promising paradigm for building embodied agents that ground perception and language into action. However, most existing approaches rely on direct …

Read Paper →

Engineering Preprint PDF DOI

CT-VIR: Continuous-Time Visual-Inertial-Ranging Fusion for Indoor Localization with Sparse Anchors

Yu-An Liu, Li Zhang · 2026

Visual-inertial odometry (VIO) is widely used for mobile robot localization, but its long-term accuracy degrades without global constraints. Incorporating ranging sensors such as ultra-wideband (UWB) …

Read Paper →

Engineering Preprint PDF DOI

Quantification and Regulation of Energy Reserves for Distributed Frequency and Voltage Control of Grid-Forming Inverters

Ahmed Saad Al-Karsani, Maryam Khanbaghi · 2026

The introduction of Renewable Energy Sources (RES) and Distributed Energy Resources (DERs) has led to the formulation of Microgrids (MGs) and Networks of MGs (NMGs). MGs and NMGs can operate in island…

Read Paper →

Engineering Preprint PDF DOI

CooperDrive: Enhancing Driving Decisions Through Cooperative Perception

Deyuan Qu, Qi Chen, Takayuki Shimizu, Onur Altintas · 2026

Autonomous vehicles equipped with robust onboard perception, localization, and planning still face limitations in occlusion and non-line-of-sight (NLOS) scenarios, where delayed reactions can increase…

Read Paper →

Engineering Preprint PDF DOI

SpaceMind: A Modular and Self-Evolving Embodied Vision-Language Agent Framework for Autonomous On-orbit Servicing

Aodi Wu, Haodong Han, Xubo Luo, Ruisuo Wang, Shan He, Xue Wan · 2026

Autonomous on-orbit servicing demands embodied agents that perceive through visual sensors, reason about 3D spatial situations, and execute multi-phase tasks over extended horizons. We present SpaceMi…

Read Paper →

Engineering Preprint PDF DOI

CART: Context-Aware Terrain Adaptation using Temporal Sequence Selection for Legged Robots

Kartikeya Singh, Youngjin Kim, Yash Turkar, Karthik Dantu · 2026

Animals in nature combine multiple modalities, such as sight and feel, to perceive terrain and develop an understanding of how to walk on uneven terrain in a stable manner. Similarly, legged robots ne…

Read Paper →

Engineering Preprint PDF DOI

UMI-3D: Extending Universal Manipulation Interface from Vision-Limited to 3D Spatial Perception

Ziming Wang · 2026

We present UMI-3D, a multimodal extension of the Universal Manipulation Interface (UMI) for robust and scalable data collection in embodied manipulation. While UMI enables portable, wrist-mounted data…

Read Paper →

Engineering Preprint PDF DOI

On-Orbit Space AI: Federated, Multi-Agent, and Collaborative Algorithms for Satellite Constellations

Ziyang Wang · 2026

Satellite constellations are transforming space systems from isolated spacecraft into networked, software-defined platforms capable of on-orbit perception, decision making, and adaptation. Yet much of…

Read Paper →

Engineering Preprint PDF DOI

mosaiks are made of tesserae: GUI design for a co-simulation framework

Eike Schulte, Jan Soren Schwarz, Malte Stomberg, Sharaf Alsharif, Danila Valko, Jirapa Kamsamsong · 2026

In a mosaic, a tessera is a single stone. We introduce tesserae for the co-simulation framework mosaik, where they are sets of entities. They allow for a visual, intuitive, and yet systematic descript…

Read Paper →

Engineering Preprint PDF DOI

Vision-and-Language Navigation for UAVs: Progress, Challenges, and a Research Roadmap

Hanxuan Chen, Jie Zheng, Siqi Yang, Tianle Zeng, Siwei Feng, Songsheng Cheng, Ruilong Ren, Hanzhong Guo, Shuai Yuan, Xiangyue Wang, Kangli Wang, Ji Pei · 2026

Vision-and-Language Navigation for Unmanned Aerial Vehicles (UAV-VLN) represents a pivotal challenge in embodied artificial intelligence, focused on enabling UAVs to interpret high-level human command…

Read Paper →

Browse Research Papers

NeuroMesh: A Unified Neural Inference Framework for Decentralized Multi-Robot Collaboration

Eccentricity Confound in EEG-based Visual Attention Decoding from Gaze-Fixated Neural Tracking of Motion in Natural Videos

Dual Pose-Graph Semantic Localization for Vision-Based Autonomous Drone Racing

Trajectory Planning for a Multi-UAV Rigid-Payload Cascaded Transportation System Based on Enhanced Tube-RRT*

CAVERS: Multimodal SLAM Data from a Natural Karstic Cave with Ground Truth Motion Capture

DockAnywhere: Data-Efficient Visuomotor Policy Learning for Mobile Manipulation via Novel Demonstration Generation

Momentum-constrained Hybrid Heuristic Trajectory Optimization Framework with Residual-enhanced DRL for Visually Impaired Scenarios

POMDP-based Object Search with Growing State Space and Hybrid Action Domain

HRDexDB: A Large-Scale Dataset of Dexterous Human and Robotic Hand Grasps

Keep It CALM: Toward Calibration-Free Kilometer-Level SLAM with Visual Geometry Foundation Models via an Assistant Eye

World-Value-Action Model: Implicit Planning for Vision-Language-Action Systems

CT-VIR: Continuous-Time Visual-Inertial-Ranging Fusion for Indoor Localization with Sparse Anchors

Quantification and Regulation of Energy Reserves for Distributed Frequency and Voltage Control of Grid-Forming Inverters

CooperDrive: Enhancing Driving Decisions Through Cooperative Perception

SpaceMind: A Modular and Self-Evolving Embodied Vision-Language Agent Framework for Autonomous On-orbit Servicing

CART: Context-Aware Terrain Adaptation using Temporal Sequence Selection for Legged Robots

UMI-3D: Extending Universal Manipulation Interface from Vision-Limited to 3D Spatial Perception

On-Orbit Space AI: Federated, Multi-Agent, and Collaborative Algorithms for Satellite Constellations

mosaiks are made of tesserae: GUI design for a co-simulation framework

Vision-and-Language Navigation for UAVs: Progress, Challenges, and a Research Roadmap

Browse by Category

Research Type

Publish Your Research