Visual Perception in Engineering — Research Repository

Engineering Preprint PDF DOI

ReFineVLA: Multimodal Reasoning-Aware Generalist Robotic Policies via Teacher-Guided Fine-Tuning

Tuan Van Vo, Tan Q. Nguyen, Khang Nguyen, Nhat Xuan Tran, Duy H. M. Nguyen, An T. Le, Ngo Anh Vien, Minh Nhat Vu · 2026

Vision-Language-Action (VLA) models have gained much attention from the research community thanks to their strength in translating multimodal observations with linguistic instructions into desired rob…

Read Paper →

Engineering Preprint PDF DOI

Building Low-Altitude Communication Networks: A Digital Twin-Based Optimization Framework

Boqun Huang, Yancheng Wang, Wei Guo, Zhaojie Guo, Di Wu, Ran Li, Dayang Liu, Wanshun Lan, Chuan Huang, Shuguang Cui · 2026

Low-altitude communication networks (LACNs) serve as the critical infrastructure of the emerging low-altitude economy (LAE), supporting services such as drone delivery and infrastructure inspection. H…

Read Paper →

Engineering Preprint PDF DOI

OmniVLA-RL: A Vision-Language-Action Model with Spatial Understanding and Online RL

Haoxiang Jie, Yaoyuan Yan, Xiangyu Wei, Kailin Wang, Hongjie Yan, Zhiyou Heng, Daocheng Chen · 2026

Visual-Language-Action (VLA) models represent a paradigm shift in embodied AI, yet existing frameworks often struggle with imprecise spatial perception, suboptimal multimodal fusion, and instability i…

Read Paper →

Engineering Preprint PDF DOI

Learned Nonlocal Feature Matching and Filtering for RAW Image Denoising

Marco Sanchez-Beeckman, Antoni Buades (IAC3 & Departament de Ciencies Matematiques i Informatica, Universitat de les Illes Balears) · 2026

Being one of the oldest and most basic problems in image processing, image denoising has seen a resurgence spurred by rapid advances in deep learning. Yet, most modern denoising architectures make lim…

Read Paper →

Engineering Preprint PDF DOI

Think before Go: Hierarchical Reasoning for Image-goal Navigation

Pengna Li, Kangyi Wu, Shaoqing Xu, Fang Li, Lin Zhao, Long Chen, Zhi-Xin Yang, Nanning Zheng · 2026

Image-goal navigation steers an agent to a target location specified by an image in unseen environments. Existing methods primarily handle this task by learning an end-to-end navigation policy, which …

Read Paper →

Engineering Preprint PDF DOI

Learning Whole-Body Humanoid Locomotion via Motion Generation and Motion Tracking

Zewei Zhang, Kehan Wen, Michael Xu, Junzhe He, Chenhao Li, Takahiro Miki, Clemens Schwarke, Chong Zhang, Xue Bin Peng, Marco Hutter · 2026

Whole-body humanoid locomotion is challenging due to high-dimensional control, morphological instability, and the need for real-time adaptation to various terrains using onboard perception. Directly a…

Read Paper →

Engineering Preprint PDF DOI

A Rapid Deployment Pipeline for Autonomous Humanoid Grasping Based on Foundation Models

Yifei Yan, Yankai Liao, Linqi Ye · 2026

Deploying a humanoid robot to manipulate a new object has traditionally required one to two days of effort: data collection, manual annotation, 3D model acquisition, and model training. This paper pre…

Read Paper →

Engineering Preprint PDF DOI

GaLa: Hypergraph-Guided Visual Language Models for Procedural Planning

Kun Wang, Yiming Li, Mingcheng Qu, Aqiang Zhang, Guang Yang, Tonghua Su · 2026

Implicit spatial relations and deep semantic structures encoded in object attributes are crucial for procedural planning in embodied AI systems. However, existing approaches often over rely on the rea…

Read Paper →

Engineering Preprint PDF DOI

CADRE: Card-Agnostic Domain-Aligned RF Embeddings for Virtual PIN Pads on Passive NFC Cards

Dickson Akuoko Sarpong, Hongzhi Guo · 2026

Near Field Communication (NFC) cards are widely used for identification, but their passive nature often limits the ability to incorporate additional security mechanisms. As a result, anyone holding th…

Read Paper →

Engineering Preprint PDF DOI

Leveraging VR Robot Games to Facilitate Data Collection for Embodied Intelligence Tasks

Yihan Zhang, Ziyun Huang, Linqi Ye · 2026

Collecting embodied interaction data at scale remains costly and difficult due to the limited accessibility of conventional interfaces. We present a gamified data collection framework based on Unity t…

Read Paper →

Engineering Preprint PDF DOI

Chain Of Interaction Benchmark (COIN): When Reasoning meets Embodied Interaction

Xianhao Wang, Xiaojian Ma, Haozhe Hu, Rongpeng Su, Yutian Cheng, Zhou Ziheng, Hangxin Liu, Lei Liu, Bin Li, Qing Li · 2026

Generalist embodied agents must perform interactive, causally-dependent reasoning, continually interacting with the environment, acquiring information, and updating plans to solve long-horizon tasks b…

Read Paper →

Engineering Preprint PDF DOI

Watching Physics: the Generative Science of Matter and Motion

Hagen Holthusen, Kevin Linka, Ellen Kuhl · 2026

Can we learn the physics of matter in motion directly from images and video--and trust it? Answering this question requires integrating experiments, physics-based simulation, and data across tradition…

Read Paper →

Engineering Preprint PDF DOI

ReconVLA: An Uncertainty-Guided and Failure-Aware Vision-Language-Action Framework for Robotic Control

Lingling Chen, Zongyao Lyu, William J. Beksi · 2026

Vision-language-action (VLA) models have emerged as generalist robotic controllers capable of mapping visual observations and natural language instructions to continuous action sequences. However, VLA…

Read Paper →

Engineering Preprint PDF DOI

Human Cognition in Machines: A Unified Perspective of World Models

Timothy Rupprecht, Pu Zhao, Amir Taherin, Arash Akbari, Arman Akbari, Yumei He, Sean Duffy, Juyi Lin, Yixiao Chen, Rahul Chowdhury, Enfu Nan, Yixin Shen, Yifan Cao, Haochen Zeng, Weiwei Chen, Geng Yuan, Jennifer Dy, Sarah Ostadabbas, Silvia Zhang, David Kaeli, Edmund Yeh, Yanzhi Wang · 2026

This comprehensive report distinguishes prior works by the cognitive functions they innovate. Many works claim an almost "human-like" cognitive capability in their world models. To evaluate these clai…

Read Paper →

Engineering Preprint PDF DOI

DENALI: A Dataset Enabling Non-Line-of-Sight Spatial Reasoning with Low-Cost LiDARs

Nikhil Behari, Diego Rivero, Luke Apostolides, Suman Ghosh, Paul Pu Liang, Ramesh Raskar · 2026

Consumer LiDARs in mobile devices and robots typically output a single depth value per pixel. Yet internally, they record full time-resolved histograms containing direct and multi-bounce light returns…

Read Paper →

Engineering Preprint PDF DOI

Dual-Modal Lung Cancer AI: Interpretable Radiology and Microscopy with Clinical Risk Integration

Baramee Sukumal, Aueaphum Aueawatthanaphisut · 2026

Lung cancer remains one of the leading causes of cancer-related mortality worldwide. Conventional computed tomography (CT) imaging, while essential for detection and staging, has limitations in distin…

Read Paper →

Engineering Preprint PDF DOI

Unified Error Analysis of Multi-site Radar via Equivalent Angular Resolution

Lang Qin, Zelin Liu, Rongjie Li, Zhiqiang Huang, Xiaoguang Liu · 2026

High-precision indoor sensing using monostatic multiple-input multiple-output (MIMO) radar typically relies on increasing the physical aperture size of antennas, leading to high hardware complexity an…

Read Paper →

Engineering Preprint PDF DOI

VADF: Vision-Adaptive Diffusion Policy Framework for Efficient Robotic Manipulation

Xinglei Yu, Zhenyang Liu, Shufeng Nan, Simo Wu, Yanwei Fu · 2026

Diffusion policies are becoming mainstream in robotic manipulation but suffer from hard negative class imbalance due to uniform sampling and lack of sample difficulty awareness, leading to slow traini…

Read Paper →

Engineering Preprint PDF DOI

TV-Regularized Frequency-Domain Full-Waveform Inversion for Single-Sided Linear Ultrasound Array Data

Rui Guo, Ditza Auerbach, Yonina C. Eldar · 2026

Quantitative speed-of-sound (SoS) and attenuation of tissues are closely related to pathology; however, conventional B-mode images are limited to qualitative visualization. Existing ultrasound full-wa…

Read Paper →

Engineering Preprint PDF DOI

When structure does not imply symmetry

Skyler R. St. Pierre, Thibault Vervenne, Ethan C. Darwin, Ellen Kuhl · 2026

Fungal protein materials exhibit inherently anisotropic microstructures formed by networks of hyphae, which suggest a natural pathway to replicate the fibrous texture of animal meat. We probe whether …

Read Paper →

Browse Research Papers

ReFineVLA: Multimodal Reasoning-Aware Generalist Robotic Policies via Teacher-Guided Fine-Tuning

Building Low-Altitude Communication Networks: A Digital Twin-Based Optimization Framework

OmniVLA-RL: A Vision-Language-Action Model with Spatial Understanding and Online RL

Learned Nonlocal Feature Matching and Filtering for RAW Image Denoising

Think before Go: Hierarchical Reasoning for Image-goal Navigation

Learning Whole-Body Humanoid Locomotion via Motion Generation and Motion Tracking

A Rapid Deployment Pipeline for Autonomous Humanoid Grasping Based on Foundation Models

GaLa: Hypergraph-Guided Visual Language Models for Procedural Planning

CADRE: Card-Agnostic Domain-Aligned RF Embeddings for Virtual PIN Pads on Passive NFC Cards

Leveraging VR Robot Games to Facilitate Data Collection for Embodied Intelligence Tasks

Chain Of Interaction Benchmark (COIN): When Reasoning meets Embodied Interaction

Watching Physics: the Generative Science of Matter and Motion

ReconVLA: An Uncertainty-Guided and Failure-Aware Vision-Language-Action Framework for Robotic Control

Human Cognition in Machines: A Unified Perspective of World Models

DENALI: A Dataset Enabling Non-Line-of-Sight Spatial Reasoning with Low-Cost LiDARs

Dual-Modal Lung Cancer AI: Interpretable Radiology and Microscopy with Clinical Risk Integration

Unified Error Analysis of Multi-site Radar via Equivalent Angular Resolution

VADF: Vision-Adaptive Diffusion Policy Framework for Efficient Robotic Manipulation

TV-Regularized Frequency-Domain Full-Waveform Inversion for Single-Sided Linear Ultrasound Array Data

When structure does not imply symmetry

Browse by Category

Research Type

Publish Your Research