Programming Languages in Engineering — Research Repository

Engineering Preprint PDF DOI

Vision-Language-Action in Robotics: A Survey of Datasets, Benchmarks, and Data Engines

Ziyao Wang, Bingying Wang, Hanrong Zhang, Tingting Du, Tianyang Chen, Guoheng Sun, Yexiao He, Zheyu Shen, Wanghao Ye, Ang Li · 2026

Despite remarkable progress in Vision--Language--Action (VLA) models, a central bottleneck remains underexamined: the data infrastructure that underlies embodied learning. In this survey, we argue tha…

Read Paper →

Engineering Preprint PDF DOI

RedVLA: Physical Red Teaming for Vision-Language-Action Models

Yuhao Zhang, Borong Zhang, Jiaming Fan, Jiachen Shen, Yishuai Cai, Yaodong Yang, Jiaming Ji · 2026

The real-world deployment of Vision-Language-Action (VLA) models remains limited by the risk of unpredictable and irreversible physical harm. However, we currently lack effective mechanisms to proacti…

Read Paper →

Engineering Preprint PDF DOI

MTT-Bench: Predicting Social Dominance in Mice via Multimodal Large Language Models

Yunquan Chen, Haoyu Chen · 2026

Understanding social dominance in animal behavior is critical for neuroscience and behavioral studies. In this work, we explore the capability of Multimodal Large Language Models(MLLMs) to analyze raw…

Read Paper →

Engineering Preprint PDF DOI

DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models

Li Li, Ming Cheng, Weixin Zhu, Yannan Wang, Juan Liu, Ming Li · 2026

Multi-speaker automatic speech recognition (ASR) aims to transcribe conversational speech involving multiple speakers, requiring the model to capture not only what was said, but also who said it and s…

Read Paper →

Engineering Preprint PDF DOI

Listening with Time: Precise Temporal Awareness for Long-Form Audio Understanding

Mingchen Shao, Hang Su, Wenjie Tian, Bingshen Mu, Zhennan Lin, Lichun Fan, Zhenbo Luo, Jian Luan, Lei Xie · 2026

While Large Audio Language Models (LALMs) achieve strong performance on short audio, they degrade on long-form inputs. This degradation is more severe in temporal awareness tasks, where temporal align…

Read Paper →

Engineering Preprint PDF DOI

CodeGraphVLP: Code-as-Planner Meets Semantic-Graph State for Non-Markovian Vision-Language-Action Models

Khoa Vo, Sieu Tran, Taisei Hanyu, Yuki Ikebe, Duy Nguyen, Bui Duy Quoc Nghi, Minh Vu, Anthony Gunderman, Chase Rainwater, Anh Nguyen, Ngan Le · 2026

Vision-Language-Action (VLA) models promise generalist robot manipulation, but are typically trained and deployed as short-horizon policies that assume the latest observation is sufficient for action …

Read Paper →

Engineering Preprint PDF DOI

UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions

Chunyu Qiang, Xiaopeng Wang, Kang Yin, Yuzhe Liang, Yuxin Guo, Teng Ma, Ziyu Zhang, Tianrui Wang, Cheng Gong, Yushen Chen, Ruibo Fu, Chen Zhang, Longbiao Wang, Jianwu Dang · 2026

Generative audio modeling has largely been fragmented into specialized tasks, text-to-speech (TTS), text-to-music (TTM), and text-to-audio (TTA), each operating under heterogeneous control paradigms. …

Read Paper →

Engineering Preprint PDF DOI

An LLM-Driven Closed-Loop Autonomous Learning Framework for Robots Facing Uncovered Tasks in Open Environments

Hong Su · 2026

Autonomous robots operating in open environments need the ability to continuously handle tasks that are not covered by predefined local methods. However, existing approaches often rely on repeated lar…

Read Paper →

Engineering Preprint PDF DOI

dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model

Yaxuan Li, Zhongyi Zhou, Yefei Chen, Yaokai Xue, Yichen Zhu · 2026

Evaluating robotics policies across thousands of environments and thousands of tasks is infeasible with existing approaches. This motivates the need for a new methodology for scalable robotics policy …

Read Paper →

Engineering Preprint PDF DOI

A Hybrid Reinforcement and Self-Supervised Learning Aided Benders Decomposition Algorithm

Bernard T. Agyeman, Zhe Li, Ilias Mitrai, Prodromos Daoutidis · 2026

We propose a hybrid reinforcement and self-supervised learning framework for accelerating generalized Benders decomposition (GBD). In this framework, a graph based reinforcement learning agent operate…

Read Paper →

Engineering Preprint PDF DOI

Long-Horizon Manipulation via Trace-Conditioned VLA Planning

Isabella Liu, An-Chieh Cheng, Rui Yan, Geng Chen, Ri-Zhao Qiu, Xueyan Zou, Sha Yi, Hongxu Yin, Xiaolong Wang, Sifei Liu · 2026

Long-horizon manipulation remains challenging for vision-language-action (VLA) policies: real tasks are multi-step, progress-dependent, and brittle to compounding execution errors. We present LoHo-Man…

Read Paper →

Engineering Preprint PDF DOI

A Multi-Stage Warm-Start Deep Learning Framework for Unit Commitment

Muhy Eddin Za'ter, Anna Van Boven, Bri-Mathias Hodge, Kyri Baker · 2026

Maintaining instantaneous balance between electricity supply and demand is critical for reliability and grid instability. System operators achieve this through solving the task of Unit Commitment (UC)…

Read Paper →

Engineering Preprint PDF DOI

Using Assembly Language for Creating Games

Haris Turkmanovic, David Vukoje, Aleksandra Lekic, Milan Prokin · 2026

The aim of this paper is to demonstrate some interesting and useful approaches for writing a program in the assembly language. In order to demonstrate the possibilities of the assembly language, a pro…

Read Paper →

Engineering Preprint PDF DOI

From Noise to Intent: Anchoring Generative VLA Policies with Residual Bridges

Yiming Zhong, Yaoyu He, Zemin Yang, Pengfei Tian, Yifan Huang, Qingqiu Huang, Xinge Zhu, Yuexin Ma · 2026

Bridging high-level semantic understanding with low-level physical control remains a persistent challenge in embodied intelligence, stemming from the fundamental spatiotemporal scale mismatch between …

Read Paper →

Engineering Preprint PDF DOI

A Replicable Robotics Awareness Method Using LLM-Enabled Robotics Interaction: Evidence from a Corporate Challenge

S. A. Prieto, M. A. Gopee, Y. Ben Arab, B. Garcia de Soto, J. Esteba, P. Olivera Brizzio · 2026

Large language models are increasingly being explored as interfaces between humans and robotic systems, yet there remains limited evidence on how such technologies can be used not only for interaction…

Read Paper →

Engineering Preprint PDF DOI

A Deployable Embodied Vision-Language Navigation System with Hierarchical Cognition and Context-Aware Exploration

Kuan Xu, Ruimeng Liu, Yizhuo Yang, Denan Liang, Tongxing Jin, Shenghai Yuan, Chen Wang, Lihua Xie · 2026

Bridging the gap between embodied intelligence and embedded deployment remains a key challenge in intelligent robotic systems, where perception, reasoning, and planning must operate under strict const…

Read Paper →

Engineering Preprint PDF DOI

Reasoning About Traversability: Language-Guided Off-Road 3D Trajectory Planning

Byounggun Park, Soonmin Hwang · 2026

While Vision-Language Models (VLMs) enable high-level semantic reasoning for end-to-end autonomous driving, particularly in unstructured environments, existing off-road datasets suffer from language a…

Read Paper →

Engineering Preprint PDF DOI

CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors

Dachong Li, ZhuangZhuang Chen, Jin Zhang, Jianqiang Li · 2026

Vision--Language--Action (VLA) models often use intermediate representations to connect multimodal inputs with continuous control, yet spatial guidance is often injected implicitly through latent feat…

Read Paper →

Engineering Preprint PDF DOI

How VLAs (Really) Work In Open-World Environments

Amir Rasouli, Yangzheng Wu, Zhiyuan Li, Rui Heng Yang, Xuan Zhao, Charles Eret, Sajjad Pakdamansavoji · 2026

Vision-language-action models (VLAs) have been extensively used in robotics applications, achieving great success in various manipulation problems. More recently, VLAs have been used in long-horizon t…

Read Paper →

Engineering Preprint PDF DOI

Efficient Design of Fronthaul-Constrained Uplink Reception for Cell-Free XL-MIMO

Dogon Kim, Hyunmin Noh, Seok-Hwan Park · 2026

With the evolution of multiple-input multiple-output (MIMO) technology toward extremely large (XL) MIMO systems comprising hundreds of, or more, antennas, this work investigates scalable and fronthaul…

Read Paper →

Browse Research Papers

Vision-Language-Action in Robotics: A Survey of Datasets, Benchmarks, and Data Engines

RedVLA: Physical Red Teaming for Vision-Language-Action Models

MTT-Bench: Predicting Social Dominance in Mice via Multimodal Large Language Models

DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models

Listening with Time: Precise Temporal Awareness for Long-Form Audio Understanding

CodeGraphVLP: Code-as-Planner Meets Semantic-Graph State for Non-Markovian Vision-Language-Action Models

UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions

An LLM-Driven Closed-Loop Autonomous Learning Framework for Robots Facing Uncovered Tasks in Open Environments

dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model

A Hybrid Reinforcement and Self-Supervised Learning Aided Benders Decomposition Algorithm

Long-Horizon Manipulation via Trace-Conditioned VLA Planning

A Multi-Stage Warm-Start Deep Learning Framework for Unit Commitment

Using Assembly Language for Creating Games

From Noise to Intent: Anchoring Generative VLA Policies with Residual Bridges

A Replicable Robotics Awareness Method Using LLM-Enabled Robotics Interaction: Evidence from a Corporate Challenge

A Deployable Embodied Vision-Language Navigation System with Hierarchical Cognition and Context-Aware Exploration

Reasoning About Traversability: Language-Guided Off-Road 3D Trajectory Planning

CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors

How VLAs (Really) Work In Open-World Environments

Efficient Design of Fronthaul-Constrained Uplink Reception for Cell-Free XL-MIMO

Browse by Category

Research Type

Publish Your Research