Visual Perception in Engineering — Research Repository

Engineering Preprint PDF DOI

The False Resonance: A Critical Examination of Emotion Embedding Similarity for Speech Generation Evaluation

Yun-Shao Tsai, Yi-Cheng Lin, Huang-Cheng Chou, Tzu-Wen Hsu, Yun-Man Hsu, Chun Wei Chen, Shrikanth Narayanan, Hung-yi Lee · 2026

Objective metrics for emotional expressiveness are vital for speech generation, particularly in expressive synthesis and voice conversion requiring emotional prosody transfer. To quantify this, the fi…

Read Paper →

Engineering Preprint PDF DOI

Optimizing Tracking Accuracy in Energy-Constrained Multimodal ISAC via Lyapunov-Driven Heterogeneous Mixture-of-Experts

Wenqi Fan, Ning Wei, Ahmad Bazzi, Rongyan Xi, Zhixian Song, You Li, Zhihan Zeng, Yue Xiu, Chadi Assi · 2026

The integration of multimodal sensing and millimeter-wave (mmWave) communications is a key enabler for highly mobile vehicle-to-infrastructure (V2I) networks. However, continuous high-resolution visua…

Read Paper →

Engineering Preprint PDF DOI

Robot Planning and Situation Handling with Active Perception

Austine Oloo, Zainab Altaweel, Yohei Hayamizu, Peiqi Liu, Yan Ding, Saeid Amiri, Hao Yang, Andy Kaminski, Chad Esselink, Chris Paxton, Xiaohan Zhang, Shiqi Zhang · 2026

Current robots are capable of computing plans to accomplish complex tasks. However, real-world environments are inherently open and dynamic, and unforeseen situations frequently arise during plan exec…

Read Paper →

Engineering Preprint PDF DOI

Privileged Foresight Distillation: Zero-Cost Future Correction for World Action Models

Pengcheng Fang, Hongli Chen, Xiaohao Cai · 2026

World action models jointly predict future video and action during training, raising an open question about what role the future-prediction branch actually plays. A recent finding shows that this bran…

Read Paper →

Engineering Preprint PDF DOI

KinDER: A Physical Reasoning Benchmark for Robot Learning and Planning

Yixuan Huang, Bowen Li, Vaibhav Saxena, Yichao Liang, Utkarsh Aashu Mishra, Liang Ji, Lihan Zha, Jimmy Wu, Nishanth Kumar, Sebastian Scherer, Danfei Xu, Tom Silver · 2026

Robotic systems that interact with the physical world must reason about kinematic and dynamic constraints imposed by their own embodiment, their environment, and the task at hand. We introduce KinDER,…

Read Paper →

Engineering Preprint PDF DOI

GS-Playground: A High-Throughput Photorealistic Simulator for Vision-Informed Robot Learning

Yufei Jia, Heng Zhang, Ziheng Zhang, Junzhe Wu, Mingrui Yu, Zifan Wang, Dixuan Jiang, Zheng Li, Chenyu Cao, Zhuoyuan Yu, Xun Yang, Haizhou Ge, Yuchi Zhang, Jiayuan Zhang, Zhenbiao Huang, Tianle Liu, Shenyu Chen, Jiacheng Wang, Bin Xie, Xuran Yao, Xiwa Deng, Guangyu Wang, Jinzhi Zhang, Lei Hao, Zhixing Chen, Yuxiang Chen, Anqi Wang, Hongyun Tian, Yiyi Yan, Zhanxiang Cao, Yizhou Jiang, Hanyang Shao, Yue Li, Lu Shi, Bokui Chen, Wei Sui, Hanqing Cui, Yusen Qin, Ruqi Huang, Lei Han, Tiancai Wang, Guyue Zhou · 2026

Embodied AI research is undergoing a shift toward vision-centric perceptual paradigms. While massively parallel simulators have catalyzed breakthroughs in proprioception-based locomotion, their potent…

Read Paper →

Engineering Preprint PDF DOI

ANCHOR: A Physically Grounded Closed-Loop Framework for Robust Home-Service Mobile Manipulation

Jinhao Jiang, Shengyu Fang, Sibo Zuo, Yujie Tang, Yirui Li · 2026

Recent advances in open-vocabulary mobile manipulation have brought robots into real domestic environments. In such settings, reliable long-horizon execution under open-set object references and frequ…

Read Paper →

Engineering Preprint PDF DOI

Cross-Linguistic Rhythmic and Spectral Feature-Based Analysis of Nyishi and Adi: Two Under-Resourced Languages of Arunachal Pradesh

Deepshikha Gogoi, Parismita Gogoi, Yang Saring · 2026

Under-resourced languages remain underrepresented in quantitative rhythm research,particularly in systematic intra-branch analysis of acoustic differentiation within closely related linguistic groups.…

Read Paper →

Engineering Preprint PDF DOI

TEACar: An Open-Source Autonomous Driving Platform

Zhongzheng Zhang, Maxwell Ruyle, Andrew Kappes, Tyler Ruble, William Shaoul, Dana Moreno, Jack Penn, Ivan Ruchkin · 2026

Intelligent Transportation Systems (ITS) increasingly rely on vision-based perception and learning-based control, necessitating experimental platforms that support realistic hardware-in-the-loop valid…

Read Paper →

Engineering Preprint PDF DOI

Libra-VLA: Achieving Learning Equilibrium via Asynchronous Coarse-to-Fine Dual-System

Yifei Wei, Linqing Zhong, Yi Liu, Yuxiang Lu, Xindong He, Maoqing Yao, Guanghui Ren · 2026

Vision-Language-Action (VLA) models are a promising paradigm for generalist robotic manipulation by grounding high-level semantic instructions into executable physical actions. However, prevailing app…

Read Paper →

Engineering Preprint PDF DOI

An analysis of sensor selection for fruit picking with suction-based grippers

Eva Krueger, Marcus Rosette, Joseph R. Davidson · 2026

Robotic fruit harvesting often fails to reliably detect whether a fruit has been successfully picked, limiting efficiency and increasing crop damage. This problem is difficult due to compliant fruit a…

Read Paper →

Engineering Preprint PDF DOI

VISION-SLS: Safe Perception-Based Control from Learned Visual Representations via System Level Synthesis

Antoine P. Leeman, Shuyu Zhan, Melanie N. Zeilinger, Glen Chou · 2026

We propose VISION-SLS, a method for nonlinear output-feedback control from high-resolution RGB images which provides robust constraint satisfaction guarantees under calibrated uncertainty bounds despi…

Read Paper →

Engineering Preprint PDF DOI

Passage-Aware Structural Mapping for RGB-D Visual SLAM

Ali Tourani, Miguel Fernandez-Cortizas, Saad Ejaz, David Perez Saura, Asier Bikandi-Noya, Jose Luis Sanchez-Lopez, Holger Voos · 2026

Doorways and passages are critical structural elements for indoor robot navigation, yet they remain underexplored in modern Visual SLAM (VSLAM) frameworks. This paper presents a passage-aware structur…

Read Paper →

Engineering Preprint PDF DOI

Agent-Centric Visual Reinforcement Learning under Dynamic Perturbations

Zhengru Fang, Yu Guo, Fei Liu, Yuang Zhang, Yihang Tao, Senkang Hu, Wenbo Ding, Yuguang Fang · 2026

Visual reinforcement learning aims to empower an agent to learn policies from visual observations, yet it remains vulnerable to dynamic visual perturbations, such as unpredictable shifts in corruption…

Read Paper →

Engineering Preprint PDF DOI

FreqCache: Accelerating Embodied VLN Models with Adaptive Frequency-Guided Token Caching

Zihao Zheng, Xingyue Zhou, Zhihao Mao, Songyu Sun, Lingyue Zhang, Yulong Ao, Yupu Feng, Qiongqiong Zhang, Yonghua Lin, Xiang Chen · 2026

Vision-Language-Navigation (VLN) models exhibit excellent navigation accuracy but incur high computational overhead. Token caching has emerged as a promising training-free strategy to reduce this cost…

Read Paper →

Engineering Preprint PDF DOI

Deep Learning-Enabled Dissolved Oxygen Sensing in Biofouling Environments for Ocean Monitoring

Nikolaos Salaris, Adrien Desjardins, Manish K. Tiwari · 2026

The escalating climate crisis and ecosystem degradation demand intelligent, low-cost sensors capable of robust, long-term monitoring in real-world environments. Absolute dissolved oxygen (DO) concentr…

Read Paper →

Engineering Preprint PDF DOI

AsyncShield: A Plug-and-Play Edge Adapter for Asynchronous Cloud-based VLA Navigation

Kai Yang, Zedong Chu, Yingnan Guo, Zhengbo Wang, Shichao Xie, Yanfen Shen, Xiaolong Wu, Xing Li, Mu Xu · 2026

While Vision-Language-Action (VLA) models have been demonstrated possessing strong zero-shot generalization for robot control, their massive parameter sizes typically necessitate cloud-based deploymen…

Read Paper →

Engineering Preprint PDF DOI

Event-based SLAM Benchmark for High-Speed Maneuvers

Sheng Zhong, Junkai Niu, Guillermo Gallego, Kaizhen Sun, Yang Yi, Zhiqiang Miao, Dewen Hu, Yaonan Wang, Davide Scaramuzza, Yi Zhou · 2026

Event-based cameras are bio-inspired sensors with pixels that independently and asynchronously respond to brightness changes at microsecond resolution, offering the potential to handle visual tasks in…

Read Paper →

Engineering Preprint PDF DOI

VLM-VPI: A Vision-Language Reasoning Framework for Improving Automated Vehicle-Pedestrian Interactions

Qingwen Pu, Kun Xie, Yuxiang Liu · 2026

Autonomous driving systems often infer pedestrian yielding behavior from geometric and kinematic cues alone, limiting their ability to reason about visual scene context and age-dependent behavioral va…

Read Paper →

Engineering Preprint PDF DOI

Decentralized Heterogeneous Multi-Robot Collaborative Exploration for Indoor and Outdoor 3D Environments

Yuxiang Li, Kun Chen, Jiancheng Wang, Shihao Fang, Haoyao Chen, Yunhui Liu · 2026

Heterogeneous multi-robot systems feature significant adaptability for complex environments. However, effective collaboration that fully exploits the robots' potential remains a core challenge. This p…

Read Paper →

Browse Research Papers

The False Resonance: A Critical Examination of Emotion Embedding Similarity for Speech Generation Evaluation

Optimizing Tracking Accuracy in Energy-Constrained Multimodal ISAC via Lyapunov-Driven Heterogeneous Mixture-of-Experts

Robot Planning and Situation Handling with Active Perception

Privileged Foresight Distillation: Zero-Cost Future Correction for World Action Models

KinDER: A Physical Reasoning Benchmark for Robot Learning and Planning

GS-Playground: A High-Throughput Photorealistic Simulator for Vision-Informed Robot Learning

ANCHOR: A Physically Grounded Closed-Loop Framework for Robust Home-Service Mobile Manipulation

Cross-Linguistic Rhythmic and Spectral Feature-Based Analysis of Nyishi and Adi: Two Under-Resourced Languages of Arunachal Pradesh

TEACar: An Open-Source Autonomous Driving Platform

Libra-VLA: Achieving Learning Equilibrium via Asynchronous Coarse-to-Fine Dual-System

An analysis of sensor selection for fruit picking with suction-based grippers

VISION-SLS: Safe Perception-Based Control from Learned Visual Representations via System Level Synthesis

Passage-Aware Structural Mapping for RGB-D Visual SLAM

Agent-Centric Visual Reinforcement Learning under Dynamic Perturbations

FreqCache: Accelerating Embodied VLN Models with Adaptive Frequency-Guided Token Caching

Deep Learning-Enabled Dissolved Oxygen Sensing in Biofouling Environments for Ocean Monitoring

AsyncShield: A Plug-and-Play Edge Adapter for Asynchronous Cloud-based VLA Navigation

Event-based SLAM Benchmark for High-Speed Maneuvers

VLM-VPI: A Vision-Language Reasoning Framework for Improving Automated Vehicle-Pedestrian Interactions

Decentralized Heterogeneous Multi-Robot Collaborative Exploration for Indoor and Outdoor 3D Environments

Browse by Category

Research Type

Publish Your Research