Expertini Research Research

Browse Research Papers

14,737+ open-access research outputs.

โœ• Clear
๐Ÿ” visual perception ๐Ÿ“‚ Engineering
Showing 14737 results for "visual perception" in Engineering
Engineering Preprint PDF DOI

ReFineVLA: Multimodal Reasoning-Aware Generalist Robotic Policies via Teacher-Guided Fine-Tuning

Tuan Van Vo, Tan Q. Nguyen, Khang Nguyen, Nhat Xuan Tran, Duy H. M. Nguyen, An T. Le, Ngo Anh Vien, Minh Nhat Vu ยท 2026

Vision-Language-Action (VLA) models have gained much attention from the research community thanks to their strength in translating multimodal observations with linguistic instructions into desired robโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Building Low-Altitude Communication Networks: A Digital Twin-Based Optimization Framework

Boqun Huang, Yancheng Wang, Wei Guo, Zhaojie Guo, Di Wu, Ran Li, Dayang Liu, Wanshun Lan, Chuan Huang, Shuguang Cui ยท 2026

Low-altitude communication networks (LACNs) serve as the critical infrastructure of the emerging low-altitude economy (LAE), supporting services such as drone delivery and infrastructure inspection. Hโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

OmniVLA-RL: A Vision-Language-Action Model with Spatial Understanding and Online RL

Haoxiang Jie, Yaoyuan Yan, Xiangyu Wei, Kailin Wang, Hongjie Yan, Zhiyou Heng, Daocheng Chen ยท 2026

Visual-Language-Action (VLA) models represent a paradigm shift in embodied AI, yet existing frameworks often struggle with imprecise spatial perception, suboptimal multimodal fusion, and instability iโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Learned Nonlocal Feature Matching and Filtering for RAW Image Denoising

Marco Sanchez-Beeckman, Antoni Buades (IAC3 & Departament de Ciencies Matematiques i Informatica, Universitat de les Illes Balears) ยท 2026

Being one of the oldest and most basic problems in image processing, image denoising has seen a resurgence spurred by rapid advances in deep learning. Yet, most modern denoising architectures make limโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Think before Go: Hierarchical Reasoning for Image-goal Navigation

Pengna Li, Kangyi Wu, Shaoqing Xu, Fang Li, Lin Zhao, Long Chen, Zhi-Xin Yang, Nanning Zheng ยท 2026

Image-goal navigation steers an agent to a target location specified by an image in unseen environments. Existing methods primarily handle this task by learning an end-to-end navigation policy, which โ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Learning Whole-Body Humanoid Locomotion via Motion Generation and Motion Tracking

Zewei Zhang, Kehan Wen, Michael Xu, Junzhe He, Chenhao Li, Takahiro Miki, Clemens Schwarke, Chong Zhang, Xue Bin Peng, Marco Hutter ยท 2026

Whole-body humanoid locomotion is challenging due to high-dimensional control, morphological instability, and the need for real-time adaptation to various terrains using onboard perception. Directly aโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

A Rapid Deployment Pipeline for Autonomous Humanoid Grasping Based on Foundation Models

Yifei Yan, Yankai Liao, Linqi Ye ยท 2026

Deploying a humanoid robot to manipulate a new object has traditionally required one to two days of effort: data collection, manual annotation, 3D model acquisition, and model training. This paper preโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

GaLa: Hypergraph-Guided Visual Language Models for Procedural Planning

Kun Wang, Yiming Li, Mingcheng Qu, Aqiang Zhang, Guang Yang, Tonghua Su ยท 2026

Implicit spatial relations and deep semantic structures encoded in object attributes are crucial for procedural planning in embodied AI systems. However, existing approaches often over rely on the reaโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

CADRE: Card-Agnostic Domain-Aligned RF Embeddings for Virtual PIN Pads on Passive NFC Cards

Dickson Akuoko Sarpong, Hongzhi Guo ยท 2026

Near Field Communication (NFC) cards are widely used for identification, but their passive nature often limits the ability to incorporate additional security mechanisms. As a result, anyone holding thโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Leveraging VR Robot Games to Facilitate Data Collection for Embodied Intelligence Tasks

Yihan Zhang, Ziyun Huang, Linqi Ye ยท 2026

Collecting embodied interaction data at scale remains costly and difficult due to the limited accessibility of conventional interfaces. We present a gamified data collection framework based on Unity tโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Chain Of Interaction Benchmark (COIN): When Reasoning meets Embodied Interaction

Xianhao Wang, Xiaojian Ma, Haozhe Hu, Rongpeng Su, Yutian Cheng, Zhou Ziheng, Hangxin Liu, Lei Liu, Bin Li, Qing Li ยท 2026

Generalist embodied agents must perform interactive, causally-dependent reasoning, continually interacting with the environment, acquiring information, and updating plans to solve long-horizon tasks bโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Watching Physics: the Generative Science of Matter and Motion

Hagen Holthusen, Kevin Linka, Ellen Kuhl ยท 2026

Can we learn the physics of matter in motion directly from images and video--and trust it? Answering this question requires integrating experiments, physics-based simulation, and data across traditionโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

ReconVLA: An Uncertainty-Guided and Failure-Aware Vision-Language-Action Framework for Robotic Control

Lingling Chen, Zongyao Lyu, William J. Beksi ยท 2026

Vision-language-action (VLA) models have emerged as generalist robotic controllers capable of mapping visual observations and natural language instructions to continuous action sequences. However, VLAโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Human Cognition in Machines: A Unified Perspective of World Models

Timothy Rupprecht, Pu Zhao, Amir Taherin, Arash Akbari, Arman Akbari, Yumei He, Sean Duffy, Juyi Lin, Yixiao Chen, Rahul Chowdhury, Enfu Nan, Yixin Shen, Yifan Cao, Haochen Zeng, Weiwei Chen, Geng Yuan, Jennifer Dy, Sarah Ostadabbas, Silvia Zhang, David Kaeli, Edmund Yeh, Yanzhi Wang ยท 2026

This comprehensive report distinguishes prior works by the cognitive functions they innovate. Many works claim an almost "human-like" cognitive capability in their world models. To evaluate these claiโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

DENALI: A Dataset Enabling Non-Line-of-Sight Spatial Reasoning with Low-Cost LiDARs

Nikhil Behari, Diego Rivero, Luke Apostolides, Suman Ghosh, Paul Pu Liang, Ramesh Raskar ยท 2026

Consumer LiDARs in mobile devices and robots typically output a single depth value per pixel. Yet internally, they record full time-resolved histograms containing direct and multi-bounce light returnsโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Dual-Modal Lung Cancer AI: Interpretable Radiology and Microscopy with Clinical Risk Integration

Baramee Sukumal, Aueaphum Aueawatthanaphisut ยท 2026

Lung cancer remains one of the leading causes of cancer-related mortality worldwide. Conventional computed tomography (CT) imaging, while essential for detection and staging, has limitations in distinโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Unified Error Analysis of Multi-site Radar via Equivalent Angular Resolution

Lang Qin, Zelin Liu, Rongjie Li, Zhiqiang Huang, Xiaoguang Liu ยท 2026

High-precision indoor sensing using monostatic multiple-input multiple-output (MIMO) radar typically relies on increasing the physical aperture size of antennas, leading to high hardware complexity anโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

VADF: Vision-Adaptive Diffusion Policy Framework for Efficient Robotic Manipulation

Xinglei Yu, Zhenyang Liu, Shufeng Nan, Simo Wu, Yanwei Fu ยท 2026

Diffusion policies are becoming mainstream in robotic manipulation but suffer from hard negative class imbalance due to uniform sampling and lack of sample difficulty awareness, leading to slow trainiโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

TV-Regularized Frequency-Domain Full-Waveform Inversion for Single-Sided Linear Ultrasound Array Data

Rui Guo, Ditza Auerbach, Yonina C. Eldar ยท 2026

Quantitative speed-of-sound (SoS) and attenuation of tissues are closely related to pathology; however, conventional B-mode images are limited to qualitative visualization. Existing ultrasound full-waโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

When structure does not imply symmetry

Skyler R. St. Pierre, Thibault Vervenne, Ethan C. Darwin, Ellen Kuhl ยท 2026

Fungal protein materials exhibit inherently anisotropic microstructures formed by networks of hyphae, which suggest a natural pathway to replicate the fibrous texture of animal meat. We probe whether โ€ฆ

Read Paper โ†’
โ† Prev Page 6 of 737 Next โ†’