Programming Languages in Engineering — Research Repository

Engineering Preprint PDF DOI

AnyUser: Translating Sketched User Intent into Domestic Robots

Songyuan Yang, Huibin Tan, Kailun Yang, Wenjing Yang, Shaowu Yang · 2026

We introduce AnyUser, a unified robotic instruction system for intuitive domestic task instruction via free-form sketches on camera images, optionally with language. AnyUser interprets multimodal inpu…

Read Paper →

Engineering Preprint PDF DOI

ROSClaw: A Hierarchical Semantic-Physical Framework for Heterogeneous Multi-Agent Collaboration

Rongfeng Zhao, Xuanhao Zhang, Zhaochen Guo, Xiang Shao, Zhongpan Zhu, Bin He, Jie Chen · 2026

The integration of large language models (LLMs) with embodied agents has improved high-level reasoning capabilities; however, a critical gap remains between semantic understanding and physical executi…

Read Paper →

Engineering Preprint PDF DOI

Visual Prompt Based Reasoning for Offroad Mapping using Multimodal LLMs

Abdelmoamen Nasser, Yousef Baba'a, Murad Mebrahtu, Nadya Abdel Madjid, Jorge Dias, Majid Khonji · 2026

Traditional approaches to off-road autonomy rely on separate models for terrain classification, height estimation, and quantifying slip or slope conditions. Utilizing several models requires training …

Read Paper →

Engineering Preprint PDF DOI

Veo-Act: How Far Can Frontier Video Models Advance Generalizable Robot Manipulation?

Zhongru Zhang, Chenghan Yang, Qingzhou Lu, Yanjiang Guo, Jianke Zhang, Yucheng Hu, Jianyu Chen · 2026

Video generation models have advanced rapidly and are beginning to show a strong understanding of physical dynamics. In this paper, we investigate how far an advanced video generation model such as Ve…

Read Paper →

Engineering Preprint PDF DOI

Precise Robot Command Understanding Using Grammar-Constrained Large Language Models

Xinyun Huo, Raghav Gnanasambandam, Xinyao Zhang · 2026

Human-robot collaboration in industrial settings requires precise and reliable communication to enhance operational efficiency. While Large Language Models (LLMs) understand general language, they oft…

Read Paper →

Engineering Preprint PDF DOI

Adaptive Action Chunking at Inference-time for Vision-Language-Action Models

Yuanchang Liang, Xiaobo Wang, Kai Wang, Shuo Wang, Xiaojiang Peng, Haoyu Chen, David Kim Huat Chua, Prahlad Vadakkepat · 2026

In Vision-Language-Action (VLA) models, action chunking (i.e., executing a sequence of actions without intermediate replanning) is a key technique to improve robotic manipulation abilities. However, a…

Read Paper →

Engineering Preprint PDF DOI

AffectSpeech: A Large-Scale Emotional Speech Dataset with Fine-Grained Textual Descriptions for Speech Emotion Captioning and Synthesis

Tianhua Qi, Wenming Zheng, Bjorn W. Schuller, Zhaojie Luo, Haizhou Li · 2026

Emotion is essential in spoken communication, yet most existing frameworks in speech emotion modeling rely on predefined categories or low-dimensional continuous attributes, which offer limited expres…

Read Paper →

Engineering Preprint PDF DOI

Enhancing 6G Wireless Intelligence: Do LLMs Work for CSI Prediction?

Mohsen Kazemian, Jurgen Jasperneite · 2026

In high-mobility 6G scenarios, rapidly time-varying channels lead to very short coherence times, which makes conventional pilot-based channel state information (CSI) estimation approaches prone to out…

Read Paper →

Engineering Preprint PDF DOI

Dynamic Whole-Body Dancing with Humanoid Robots -- A Model-Based Control Approach

Shibowen Zhang, Jiayang Wu, Guannan Liu, Helin Zhu, Junjie Liu, Zhehan Li, Junhong Guo, Xiaokun Leng, Hangxin Liu, Jingwen Zhang, Jikai Wang, Zonghai Chen, Zhicheng He, Jiayi Wang, Yao Su · 2026

This paper presents an integrated model-based framework for generating and executing dynamic whole-body dance motions on humanoid robots. The framework operates in two stages: offline motion generatio…

Read Paper →

Engineering Preprint PDF DOI

From Prompt to Physical Action: Structured Backdoor Attacks on LLM-Mediated Robotic Control Systems

Mingyang Xie, Jin Wei-Kocsis · 2026

The integration of large language models (LLMs) into robotic control pipelines enables natural language interfaces that translate user prompts into executable commands. However, this digital-to-physic…

Read Paper →

Engineering Preprint PDF DOI

OpenRC: An Open-Source Robotic Colonoscopy Framework for Multimodal Data Acquisition and Autonomy Research

Siddhartha Kapuria, Mohammad Rafiee Javazm, Naruhiko Ikoma, Joga Ivatury, Mohammad Ali Nasseri, Nassir Navab, Farshid Alambeigi · 2026

Colorectal cancer screening critically depends on colonoscopy, yet existing platforms offer limited support for systematically studying the coupled dynamics of operator control, instrument motion, and…

Read Paper →

Engineering Preprint PDF DOI

Build on Priors: Vision--Language--Guided Neuro-Symbolic Imitation Learning for Data-Efficient Real-World Robot Manipulation

Pierrick Lorang, Johannes Huemer, Timothy Duggan, Kai Goebel, Patrik Zips, Matthias Scheutz · 2026

Enabling robots to learn long-horizon manipulation tasks from a handful of demonstrations remains a central challenge in robotics. Existing neuro-symbolic approaches often rely on hand-crafted symboli…

Read Paper →

Engineering Preprint PDF DOI

Belief Dynamics for Detecting Behavioral Shifts in Safe Collaborative Manipulation

Devashri Naik, Divake Kumar, Nastaran Darabi, Amit Ranjan Trivedi · 2026

Robots operating in shared workspaces must maintain safe coordination with other agents whose behavior may change during task execution. When a collaborating agent switches strategy mid-episode, conti…

Read Paper →

Engineering Preprint PDF DOI

Do Robots Need Body Language? Comparing Communication Modalities for Legible Motion Intent in Human-Shared Spaces

Jonathan Albert Cohen, Kye Shimizu, Allen Song, Vishnu Bharath, Kent Larson, Pattie Maes · 2026

Robots in shared spaces often move in ways that are difficult for people to interpret, placing the burden on humans to adapt. High-DoF robots exhibit motion that people read as expressive, intentional…

Read Paper →

Engineering Preprint PDF DOI

ARES OS 2.0: An Orchestration Software Suite for Autonomous Experimentation Systems and Self-Driving Labs

Arthur W. N. Sloan, Robert W. Waelder, Morgen L. Smith, Nicholas Kleiner, Arnas Babeckis, Jason Wheeler, Daylond Hooper, Benji Maruyama · 2026

ARES OS 2.0 (hereinafter ARES OS) is an open-source software suite to enable laboratory automation and closed-loop autonomous experimentation. Its function is to orchestrate experimental actions and d…

Read Paper →

Engineering Preprint PDF DOI

The Compression Gap: Why Discrete Tokenization Limits Vision-Language-Action Model Scaling

Takuya Shiba · 2026

Scaling Vision-Language-Action (VLA) models by upgrading the vision encoder is expected to improve downstream manipulation performance--as it does in vision-language modeling. We show that this expect…

Read Paper →

Engineering Preprint PDF DOI

Multi-View Video Diffusion Policy: A 3D Spatio-Temporal-Aware Video Action Model

Peiyan Li, Yixiang Chen, Yuan Xu, Jiabing Yang, Xiangnan Wu, Jun Guo, Nan Sun, Long Qian, Xinghang Li, Xin Xiao, Jing Liu, Nianfeng Liu, Tao Kong, Yan Huang, Liang Wang, Tieniu Tan · 2026

Robotic manipulation requires understanding both the 3D spatial structure of the environment and its temporal evolution, yet most existing policies overlook one or both. They typically rely on 2D visu…

Read Paper →

Engineering Preprint PDF DOI

FSUNav: A Cerebrum-Cerebellum Architecture for Fast, Safe, and Universal Zero-Shot Goal-Oriented Navigation

Mingao Tan, Yiyang Li, Shanze Wang, Xinming Zhang, Wei Zhang · 2026

Current vision-language navigation methods face substantial bottlenecks regarding heterogeneous robot compatibility, real-time performance, and navigation safety. Furthermore, they struggle to support…

Read Paper →

Engineering Preprint PDF DOI

Minimal Information Control Invariance via Vector Quantization

Ege Yuceel, Teodor Tchalakov, Sayan Mitra · 2026

Safety-critical autonomous systems must satisfy hard state constraints under tight computational and sensing budgets, yet learning-based controllers are often far more complex than safe operation requ…

Read Paper →

Engineering Preprint PDF DOI

Open-Loop Planning, Closed-Loop Verification: Speculative Verification for VLA

Zihua Wang, Zhitao Lin, Ruibo Li, Yu Zhang, Xu Yang, Siya Mi, Xiu-Shen Wei · 2026

Vision-Language-Action (VLA) models, as large foundation models for embodied control, have shown strong performance in manipulation tasks. However, their performance comes at high inference cost. To i…

Read Paper →

Browse Research Papers

AnyUser: Translating Sketched User Intent into Domestic Robots

ROSClaw: A Hierarchical Semantic-Physical Framework for Heterogeneous Multi-Agent Collaboration

Visual Prompt Based Reasoning for Offroad Mapping using Multimodal LLMs

Veo-Act: How Far Can Frontier Video Models Advance Generalizable Robot Manipulation?

Precise Robot Command Understanding Using Grammar-Constrained Large Language Models

Adaptive Action Chunking at Inference-time for Vision-Language-Action Models

AffectSpeech: A Large-Scale Emotional Speech Dataset with Fine-Grained Textual Descriptions for Speech Emotion Captioning and Synthesis

Enhancing 6G Wireless Intelligence: Do LLMs Work for CSI Prediction?

Dynamic Whole-Body Dancing with Humanoid Robots -- A Model-Based Control Approach

From Prompt to Physical Action: Structured Backdoor Attacks on LLM-Mediated Robotic Control Systems

OpenRC: An Open-Source Robotic Colonoscopy Framework for Multimodal Data Acquisition and Autonomy Research

Build on Priors: Vision--Language--Guided Neuro-Symbolic Imitation Learning for Data-Efficient Real-World Robot Manipulation

Belief Dynamics for Detecting Behavioral Shifts in Safe Collaborative Manipulation

Do Robots Need Body Language? Comparing Communication Modalities for Legible Motion Intent in Human-Shared Spaces

ARES OS 2.0: An Orchestration Software Suite for Autonomous Experimentation Systems and Self-Driving Labs

The Compression Gap: Why Discrete Tokenization Limits Vision-Language-Action Model Scaling

Multi-View Video Diffusion Policy: A 3D Spatio-Temporal-Aware Video Action Model

FSUNav: A Cerebrum-Cerebellum Architecture for Fast, Safe, and Universal Zero-Shot Goal-Oriented Navigation

Minimal Information Control Invariance via Vector Quantization

Open-Loop Planning, Closed-Loop Verification: Speculative Verification for VLA

Browse by Category

Research Type

Publish Your Research