Programming Languages in Engineering — Research Repository

Engineering Preprint PDF DOI

Vision-and-Language Navigation for UAVs: Progress, Challenges, and a Research Roadmap

Hanxuan Chen, Jie Zheng, Siqi Yang, Tianle Zeng, Siwei Feng, Songsheng Cheng, Ruilong Ren, Hanzhong Guo, Shuai Yuan, Xiangyue Wang, Kangli Wang, Ji Pei · 2026

Vision-and-Language Navigation for Unmanned Aerial Vehicles (UAV-VLN) represents a pivotal challenge in embodied artificial intelligence, focused on enabling UAVs to interpret high-level human command…

Read Paper →

Engineering Preprint PDF DOI

Evolvable Embodied Agent for Robotic Manipulation via Long Short-Term Reflection and Optimization

Jianzong Wang, Botao Zhao, Yayun He, Junqing Peng, Xulong Zhang · 2026

Achieving general-purpose robotics requires empowering robots to adapt and evolve based on their environment and feedback. Traditional methods face limitations such as extensive training requirements,…

Read Paper →

Engineering Preprint PDF DOI

Few-Shot and Pseudo-Label Guided Speech Quality Evaluation with Large Language Models

Ryandhimas E. Zezario, Dyah A. M. G. Wisnu, Szu-Wei Fu, Sabato Marco Siniscalchi, Hsin-Min Wang, Yu Tsao · 2026

In this paper, we introduce GatherMOS, a novel framework that leverages large language models (LLM) as meta-evaluators to aggregate diverse signals into quality predictions. GatherMOS integrates light…

Read Paper →

Engineering Preprint PDF DOI

On the Optimality of Uncertain MDP Abstractions

Ibon Gracia, Morteza Lahijanian · 2026

We study the asymptotic optimality of abstraction-based control synthesis algorithms. Specifically, we consider uncertain MDP (UMDP) abstraction, and investigate whether refinement leads to optimal re…

Read Paper →

Engineering Preprint PDF DOI

In-Sync: Adaptation of Speech Aware Large Language Models for ASR with Word Level Timestamp Predictions

Xulin Fan, Vishal Sunder, Samuel Thomas, Mark Hasegawa-Johnson, Brian Kingsbury, George Saon · 2026

Recent advances in speech-aware language models have coupled strong acoustic encoders with large language models, enabling systems that move beyond transcription to produce richer outputs. Among these…

Read Paper →

Engineering Preprint PDF DOI

Synthesis and Deployment of Maximal Robust Control Barrier Functions through Adversarial Reinforcement Learning

Donggeon David Oh, Duy P. Nguyen, Haimin Hu, Jaime Fernandez Fisac · 2026

Robust control barrier functions (CBFs) provide a principled mechanism for smooth safety enforcement under worst-case disturbances. However, existing approaches typically rely on explicit, closed-form…

Read Paper →

Engineering Preprint PDF DOI

Robotic Manipulation is Vision-to-Geometry Mapping ($f(v) \rightarrow G$): Vision-Geometry Backbones over Language and Video Models

Zijian Song, Qichang Li, Jiawei Zhou, Zhenlong Yuan, Tianshui Chen, Liang Lin, Guangrun Wang · 2026

At its core, robotic manipulation is a problem of vision-to-geometry mapping ($f(v) \rightarrow G$). Physical actions are fundamentally defined by geometric properties like 3D positions and spatial re…

Read Paper →

Engineering Preprint PDF DOI

VULCAN: Vision-Language-Model Enhanced Multi-Agent Cooperative Navigation for Indoor Fire-Disaster Response

Shengding Liu, Qiben Yan · 2026

Indoor fire disasters pose severe challenges to autonomous search and rescue due to dense smoke, high temperatures, and dynamically evolving indoor environments. In such time-critical scenarios, multi…

Read Paper →

Engineering Preprint PDF DOI

Distributionally Robust Stochastic MPC under Disturbance-Affine Feedback Policies

Xu Chen, Lorenz Dorschel · 2026

This study addresses the stochastic Model Predictive Control (MPC) problem for linear time-invariant systems subjected to unknown disturbance distributions. By leveraging the most recent disturbance d…

Read Paper →

Engineering Preprint PDF DOI

Audio-Cogito: Towards Deep Audio Reasoning in Large Audio Language Models

Longhao Li, Hongjie Chen, Zehan Li, Qihan Hu, Jian Kang, Jie Li, Lei Xie, Yongxiang Li · 2026

Recent advances in reasoning models have driven significant progress in text and multimodal domains, yet audio reasoning remains relatively limited. Only a few Large Audio Language Models (LALMs) inco…

Read Paper →

Engineering Preprint PDF DOI

DeCoNav: Dialog enhanced Long-Horizon Collaborative Vision-Language Navigation

Sunyao Zhou, Yunzi Wu, Tianhang Wang, Xinhai Li, Guang Chen, Lizheng Liu, Chenjia Bai, Xuelong Li · 2026

Long-horizon collaborative vision-language navigation (VLN) is critical for multi-robot systems to accomplish complex tasks beyond the capability of a single agent. CoNavBench takes a first step by in…

Read Paper →

Engineering Preprint PDF DOI

X-VC: Zero-shot Streaming Voice Conversion in Codec Space

Qixi Zheng, Yuxiang Zhao, Tianrui Wang, Wenxi Chen, Kele Xu, Yikang Li, Qinyuan Chen, Xipeng Qiu, Kai Yu, Xie Chen · 2026

Zero-shot voice conversion (VC) aims to convert a source utterance into the voice of an unseen target speaker while preserving its linguistic content. Although recent systems have improved conversion …

Read Paper →

Engineering Preprint PDF DOI

HazardArena: Evaluating Semantic Safety in Vision-Language-Action Models

Zixing Chen, Yifeng Gao, Li Wang, Yunhan Zhao, Yi Liu, Jiayu Li, Xiang Zheng, Zuxuan Wu, Cong Wang, Xingjun Ma, Yu-Gang Jiang · 2026

Vision-Language-Action (VLA) models inherit rich world knowledge from vision-language backbones and acquire executable skills via action demonstrations. However, existing evaluations largely focus on …

Read Paper →

Engineering Preprint PDF DOI

An Ultra-Low Latency, End-to-End Streaming Speech Synthesis Architecture via Block-Wise Generation and Depth-Wise Codec Decoding

Tianhui Su, Tien-Ping Tan, Salima Mdhaffar, Yannick Esteve, Aghilas Sini · 2026

Real-time speech synthesis requires balancing inference latency and acoustic fidelity for interactive applications. Conventional continuous text-to-speech pipelines require computationally intensive n…

Read Paper →

Engineering Preprint PDF DOI

Contextual Biasing for ASR in Speech LLM with Common Word Cues and Bias Word Position Prediction

Sashi Novitasari, Takashi Fukuda, Kurata Gakuto, George Saon · 2026

Speech-aware LLMs (SLLMs) have recently achieved state-of-the-art ASR performance; however, they still fail to accurately transcribe bias words that appear rarely or never in the training data. Contex…

Read Paper →

Engineering Preprint PDF DOI

Robotic Nanoparticle Synthesis via Solution-based Processes

Dasharadhan Mahalingam, Michael Gallagher, Nilanjan Chakraborty, Stanislaus S. Wong · 2026

We present a screw geometry-based manipulation planning framework for the robotic automation of solution-based synthesis, exemplified through the preparation of gold and magnetite nanoparticles. The s…

Read Paper →

Engineering Preprint PDF DOI

Why Your Tokenizer Fails in Information Fusion: A Timing-Aware Pre-Quantization Fusion for Video-Enhanced Audio Tokenization

Xiangyu Zhang, Benjamin John Southwell, Siqi Pan, Xinlei Niu, Beena Ahmed, Julien Epps · 2026

Audio tokenization has emerged as a critical component in end-to-end audio language models, enabling efficient discrete representation learning for both audio understanding and generation tasks. Howev…

Read Paper →

Engineering Preprint PDF DOI

M2HRI: An LLM-Driven Multimodal Multi-Agent Framework for Personalized Human-Robot Interaction

Shaid Hasan, Breenice Lee, Sujan Sarker, Tariq Iqbal · 2026

Multi-robot systems hold significant promise for social environments such as homes and hospitals, yet existing multi-robot works treat robots as functionally identical, overlooking how robots individu…

Read Paper →

Engineering Preprint PDF DOI

StarVLA-$\alpha$: Reducing Complexity in Vision-Language-Action Systems

Jinhui Ye, Ning Gao, Senqiao Yang, Jinliang Zheng, Zixuan Wang, Yuxin Chen, Pengguang Chen, Yilun Chen, Shu Liu, Jiaya Jia · 2026

Vision-Language-Action (VLA) models have recently emerged as a promising paradigm for building general-purpose robotic agents. However, the VLA landscape remains highly fragmented and complex: as exis…

Read Paper →

Engineering Preprint PDF DOI

Grounded World Model for Semantically Generalizable Planning

Quanyi Li, Lan Feng, Haonan Zhang, Wuyang Li, Letian Wang, Alexandre Alahi, Harold Soh · 2026

In Model Predictive Control (MPC), world models predict the future outcomes of various action proposals, which are then scored to guide the selection of the optimal action. For visuomotor MPC, the sco…

Read Paper →

Browse Research Papers

Vision-and-Language Navigation for UAVs: Progress, Challenges, and a Research Roadmap

Evolvable Embodied Agent for Robotic Manipulation via Long Short-Term Reflection and Optimization

Few-Shot and Pseudo-Label Guided Speech Quality Evaluation with Large Language Models

On the Optimality of Uncertain MDP Abstractions

In-Sync: Adaptation of Speech Aware Large Language Models for ASR with Word Level Timestamp Predictions

Synthesis and Deployment of Maximal Robust Control Barrier Functions through Adversarial Reinforcement Learning

Robotic Manipulation is Vision-to-Geometry Mapping ($f(v) \rightarrow G$): Vision-Geometry Backbones over Language and Video Models

VULCAN: Vision-Language-Model Enhanced Multi-Agent Cooperative Navigation for Indoor Fire-Disaster Response

Distributionally Robust Stochastic MPC under Disturbance-Affine Feedback Policies

Audio-Cogito: Towards Deep Audio Reasoning in Large Audio Language Models

DeCoNav: Dialog enhanced Long-Horizon Collaborative Vision-Language Navigation

X-VC: Zero-shot Streaming Voice Conversion in Codec Space

HazardArena: Evaluating Semantic Safety in Vision-Language-Action Models

An Ultra-Low Latency, End-to-End Streaming Speech Synthesis Architecture via Block-Wise Generation and Depth-Wise Codec Decoding

Contextual Biasing for ASR in Speech LLM with Common Word Cues and Bias Word Position Prediction

Robotic Nanoparticle Synthesis via Solution-based Processes

Why Your Tokenizer Fails in Information Fusion: A Timing-Aware Pre-Quantization Fusion for Video-Enhanced Audio Tokenization

M2HRI: An LLM-Driven Multimodal Multi-Agent Framework for Personalized Human-Robot Interaction

StarVLA-$\alpha$: Reducing Complexity in Vision-Language-Action Systems

Grounded World Model for Semantically Generalizable Planning

Browse by Category

Research Type

Publish Your Research