Programming Languages in Engineering — Research Repository

Engineering Preprint PDF DOI

Towards the Vision-Sound-Language-Action Paradigm: The HEAR Framework for Sound-Centric Manipulation

Chang Nie, Tianchen Deng, Guangming Wang, Zhe Liu, Hesheng Wang · 2026

While recent Vision-Language-Action (VLA) models have begun to incorporate audio, they typically treat sound as static pre-execution prompts or focus exclusively on human speech. This leaves a signifi…

Read Paper →

Engineering Preprint PDF DOI

Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models

Yanru Wu, Weiduo Yuan, Ang Qi, Vitor Guizilini, Jiageng Mao, Yue Wang · 2026

Reinforcement Learning (RL) has shown great potential in refining robotic manipulation policies, yet its efficacy remains strongly bottlenecked by the difficulty of designing generalizable reward func…

Read Paper →

Engineering Preprint PDF DOI

Compact Optical Single-axis Joint Torque Sensor Using Redundant Photo-Reflectors and Quadratic-Programming Calibration

Hyun-Bin Kim, Byeong-Il Ham, Kyung-Soo Kim · 2026

This study proposes a non-contact photo-reflector-based joint torque sensor for precise joint-level torque control and safe physical interaction. Current-sensor-based torque estimation in many collabo…

Read Paper →

Engineering Preprint PDF DOI

Geometry-Aligned LLM Fine-Tuning for Sequential Narrow-Opening Planning

Al Jaber Mahmud, Xuan Wang · 2026

We study rigid-body motion planning through multiple sequential narrow openings, which requires long-horizon geometric reasoning because the configuration used to traverse an early opening constrains …

Read Paper →

Engineering Preprint PDF DOI

Safety Case Patterns for VLA-based driving systems: Insights from SimLingo

Gerhard Yu, Fuyuki Ishikawa, Oluwafemi Odu, Alvine Boaye Belle · 2026

Vision-Language-Action (VLA)-based driving systems represent a significant paradigm shift in autonomous driving since, by combining traffic scene understanding, linguistic interpretation, and action g…

Read Paper →

Engineering Preprint PDF DOI

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

Jaesung Bae, Xiuwen Zheng, Minje Kim, Chang D. Yoo, Mark Hasegawa-Johnson · 2026

Dysarthric speech quality assessment (DSQA) is critical for clinical diagnostics and inclusive speech technologies. However, subjective evaluation is costly and difficult to scale, and the scarcity of…

Read Paper →

Engineering Preprint PDF DOI

ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors

Zifan Xu, Ran Gong, Maria Vittoria Minniti, Ahmet Salih Gundogdu, Eric Rosen, Kausik Sivakumar, Riedana Yan, Zixing Wang, Di Deng, Peter Stone, Xiaohan Zhang, Karl Schmeckpeper · 2026

Learning generalizable and robust behavior cloning policies requires large volumes of high-quality robotics data. While human demonstrations (e.g., through teleoperation) serve as the standard source …

Read Paper →

Engineering Preprint PDF DOI

Embodied Foundation Models at the Edge: A Survey of Deployment Constraints and Mitigation Strategies

Utkarsh Grover, Ravi Ranjan, Mingyang Mao, Trung Tien Dong, Satvik Praveen, Zhenqi Wu, J. Morris Chang, Tinoosh Mohsenin, Yi Sheng, Agoritsa Polyzou, Eiman Kanjo, Xiaomin Lin · 2026

Deploying foundation models in embodied edge systems is fundamentally a systems problem, not just a problem of model compression. Real-time control must operate within strict size, weight, and power c…

Read Paper →

Engineering Preprint PDF DOI

CorrectionPlanner: Self-Correction Planner with Reinforcement Learning in Autonomous Driving

Yihong Guo, Dongqiangzi Ye, Sijia Chen, Anqi Liu, Xianming Liu · 2026

Autonomous driving requires safe planning, but most learning-based planners lack explicit self-correction ability: once an unsafe action is proposed, there is no mechanism to correct it. Thus, we prop…

Read Paper →

Engineering Preprint PDF DOI

MA-VLCM: A Vision Language Critic Model for Value Estimation of Policies in Multi-Agent Team Settings

Shahil Shaik, Aditya Parameshwaran, Anshul Nayak, Jonathon M. Smereka, Yue Wang · 2026

Multi-agent reinforcement learning (MARL) commonly relies on a centralized critic to estimate the value function. However, learning such a critic from scratch is highly sample-inefficient and often la…

Read Paper →

Engineering Preprint PDF DOI

MoE-ACT: Scaling Multi-Task Bimanual Manipulation with Sparse Language-Conditioned Mixture-of-Experts Transformers

Kangjun Guo, Haichao Liu, Yanji Sun, Ruhan Zhao, Jinni Zhou, Jun Ma · 2026

The ability of robots to handle multiple tasks under a unified policy is critical for deploying embodied intelligence in real-world household and industrial applications. However, out-of-distribution …

Read Paper →

Engineering Preprint PDF DOI

HapticVLA: Contact-Rich Manipulation via Vision-Language-Action Model without Inference-Time Tactile Sensing

Konstantin Gubernatorov, Mikhail Sannikov, Ilya Mikhalchuk, Egor Kuznetsov, Makar Artemov, Ogunwoye Faith Ouwatobi, Marcelino Fernando, Artem Asanov, Ziang Guo, Dzmitry Tsetserukou · 2026

Tactile sensing is a crucial capability for Vision-Language-Action (VLA) architectures, as it enables dexterous and safe manipulation in contact-rich tasks. However, reliance on dedicated tactile hard…

Read Paper →

Engineering Preprint PDF DOI

NavGSim: High-Fidelity Gaussian Splatting Simulator for Large-Scale Navigation

Jiahang Liu, Yuanxing Duan, Jiazhao Zhang, Minghan Li, Shaoan Wang, Zhizheng Zhang, He Wang · 2026

Simulating realistic environments for robots is widely recognized as a critical challenge in robot learning, particularly in terms of rendering and physical simulation. This challenge becomes even mor…

Read Paper →

Engineering Preprint PDF DOI

ForceVLA2: Unleashing Hybrid Force-Position Control with Force Awareness for Contact-Rich Manipulation

Yang Li, Zhaxizhuoma, Hongru Jiang, Junjie Xia, Hongquan Zhang, Jinda Du, Yunsong Zhou, Jia Zeng, Ce Hao, Jieji Ren, Qiaojun Yu, Cewu Lu, Yu Qiao, Jiangmiao Pang · 2026

Embodied intelligence for contact-rich manipulation has predominantly relied on position control, while explicit awareness and regulation of interaction forces remain under-explored, limiting stabilit…

Read Paper →

Engineering Preprint PDF DOI

Vision-Language Model Based Multi-Expert Fusion for CT Image Classification

Jianfa Bai, Kejin Lu, Runtian Yuan, Qingqiu Li, Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng · 2026

Robust detection of COVID-19 from chest CT remains challenging in multi-institutional settings due to substantial source shift, source imbalance, and hidden test-source identities. In this work, we pr…

Read Paper →

Engineering Preprint PDF DOI

Confusion-Aware In-Context-Learning for Vision-Language Models in Robotic Manipulation

Yayun He, Zuheng Kang, Botao Zhao, Zhouyin Wu, Junqing Peng, Jianzong Wang · 2026

Vision-language models (VLMs) have significantly improved the generalization capabilities of robotic manipulation. However, VLM-based systems often suffer from a lack of robustness, leading to unpredi…

Read Paper →

Engineering Preprint PDF DOI

AeroGrab: A Unified Framework for Aerial Grasping in Cluttered Environments

Shivansh Pratap Singh, Naveen Sudheer Nair, Samaksh Ujjawal, Sarthak Mishra, Soham Patil, Rishabh Dev Yadav, Spandan Roy · 2026

Reliable aerial grasping in cluttered environments remains challenging due to occlusions and collision risks. Existing aerial manipulation pipelines largely rely on centroid-based grasping and lack in…

Read Paper →

Engineering Preprint PDF DOI

Beam Prediction Based on Multimodal Large Language Models

Tianhao Mao, Le Liang, Jie Yang, Xiao Li, Shi Jin, Geoffrey Ye Li · 2026

Accurate beam prediction is a key enabler for next-generation wireless communication systems. In this paper, we propose a multimodal large language model (LLM)-based beam prediction framework that eff…

Read Paper →

Engineering Preprint PDF DOI

AnoleVLA: Lightweight Vision-Language-Action Model with Deep State Space Models for Mobile Manipulation

Yusuke Takagi, Motonari Kambara, Daichi Yashima, Koki Seno, Kento Tokura, Komei Sugiura · 2026

In this study, we address the problem of language-guided robotic manipulation, where a robot is required to manipulate a wide range of objects based on visual observations and natural language instruc…

Read Paper →

Engineering Preprint PDF DOI

Learning from Mistakes: Post-Training for Driving VLA with Takeover Data

Yinfeng Gao, Deqing Liu, Qichao Zhang, Yupeng Zheng, Haochen Tian, Guang Li, Hangjun Ye, Long Chen, Da-Wei Ding, Dongbin Zhao · 2026

Current Vision-Language-Action (VLA) paradigms in end-to-end autonomous driving rely on offline training from static datasets, leaving them vulnerable to distribution shift. Recent post-training metho…

Read Paper →

Browse Research Papers

Towards the Vision-Sound-Language-Action Paradigm: The HEAR Framework for Sound-Centric Manipulation

Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models

Compact Optical Single-axis Joint Torque Sensor Using Redundant Photo-Reflectors and Quadratic-Programming Calibration

Geometry-Aligned LLM Fine-Tuning for Sequential Narrow-Opening Planning

Safety Case Patterns for VLA-based driving systems: Insights from SimLingo

Something from Nothing: Data Augmentation for Robust Severity Level Estimation of Dysarthric Speech

ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors

Embodied Foundation Models at the Edge: A Survey of Deployment Constraints and Mitigation Strategies

CorrectionPlanner: Self-Correction Planner with Reinforcement Learning in Autonomous Driving

MA-VLCM: A Vision Language Critic Model for Value Estimation of Policies in Multi-Agent Team Settings

MoE-ACT: Scaling Multi-Task Bimanual Manipulation with Sparse Language-Conditioned Mixture-of-Experts Transformers

HapticVLA: Contact-Rich Manipulation via Vision-Language-Action Model without Inference-Time Tactile Sensing

NavGSim: High-Fidelity Gaussian Splatting Simulator for Large-Scale Navigation

ForceVLA2: Unleashing Hybrid Force-Position Control with Force Awareness for Contact-Rich Manipulation

Vision-Language Model Based Multi-Expert Fusion for CT Image Classification

Confusion-Aware In-Context-Learning for Vision-Language Models in Robotic Manipulation

AeroGrab: A Unified Framework for Aerial Grasping in Cluttered Environments

Beam Prediction Based on Multimodal Large Language Models

AnoleVLA: Lightweight Vision-Language-Action Model with Deep State Space Models for Mobile Manipulation

Learning from Mistakes: Post-Training for Driving VLA with Takeover Data

Browse by Category

Research Type

Publish Your Research