Programming Languages in Engineering — Research Repository

Engineering Preprint PDF DOI

StreamVoiceAnon+: Emotion-Preserving Streaming Speaker Anonymization via Frame-Level Acoustic Distillation

Nikita Kuzmin, Kong Aik Lee, Eng Siong Chng · 2026

We address the challenge of preserving emotional content in streaming speaker anonymization (SA). Neural audio codec language models trained for audio continuation tend to degrade source emotion: cont…

Read Paper →

Engineering Preprint PDF DOI

Lifelong Embodied Navigation Learning

Xudong Wang, Jiahua Dong, Baichen Liu, Qi Lyu, Lianqing Liu, Zhi Han · 2026

Embodied navigation agents powered by large language models have shown strong performance on individual tasks but struggle to continually acquire new navigation skills, which suffer from catastrophic …

Read Paper →

Engineering Preprint PDF DOI

Restoring Linguistic Grounding in VLA Models via Train-Free Attention Recalibration

Ninghao Zhang, Bin Zhu, Shijie Zhou, Jingjing Chen · 2026

Vision-Language-Action (VLA) models enable robots to perform manipulation tasks directly from natural language instructions and are increasingly viewed as a foundation for generalist robotic policies.…

Read Paper →

Engineering Preprint PDF DOI

HarvestFlex: Strawberry Harvesting via Vision-Language-Action Policy Adaptation in the Wild

Ziyang Zhao, Shuheng Wang, Zhonghua Miao, Ya Xiong · 2026

This work presents the first study on transferring vision-language-action (VLA) policies to real greenhouse tabletop strawberry harvesting, a long-horizon, unstructured task challenged by occlusion an…

Read Paper →

Engineering Preprint PDF DOI

X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs

Di Cao, Dongjie Fu, Hai Yu, Siqi Zheng, Xu Tan, Tao Jin · 2026

While the shift from cascaded dialogue systems to end-to-end (E2E) speech Large Language Models (LLMs) improves latency and paralinguistic modeling, E2E models often exhibit a significant performance …

Read Paper →

Engineering Preprint PDF DOI

AnyCamVLA: Zero-Shot Camera Adaptation for Viewpoint Robust Vision-Language-Action Models

Hyeongjun Heo, Seungyeon Woo, Sang Min Kim, Junho Kim, Junho Lee, Yonghyeon Lee, Young Min Kim · 2026

Despite remarkable progress in Vision-Language-Action models (VLAs) for robot manipulation, these large pre-trained models require fine-tuning to be deployed in specific environments. These fine-tuned…

Read Paper →

Engineering Preprint PDF DOI

STL-SVPIO: Signal Temporal Logic guided Stein Variational Path Integral Optimization

Hongrui Zheng, Zirui Zang, Ahmad Amine, Cristian Ioan Vasile, Rahul Mangharam · 2026

Signal Temporal Logic (STL) enables formal specification of complex spatiotemporal constraints for robotic task planning. However, synthesizing long-horizon continuous control trajectories from comple…

Read Paper →

Engineering Preprint PDF DOI

EmboAlign: Aligning Video Generation with Compositional Constraints for Zero-Shot Manipulation

Gehao Zhang, Zhenyang Ni, Payal Mohapatra, Han Liu, Ruohan Zhang, Qi Zhu · 2026

Video generative models (VGMs) pretrained on large-scale internet data can produce temporally coherent rollout videos that capture rich object dynamics, offering a compelling foundation for zero-shot …

Read Paper →

Engineering Preprint PDF DOI

Safe-Night VLA: Seeing the Unseen via Thermal-Perceptive Vision-Language-Action Models for Safety-Critical Manipulation

Dian Yu, Qingchuan Zhou, Bingkun Huang, Majid Khadiv, Zewen Yang · 2026

Current Vision-Language-Action (VLA) models rely primarily on RGB perception, preventing them from capturing modalities such as thermal signals that are imperceptible to conventional visual sensors. M…

Read Paper →

Engineering Preprint PDF DOI

Vision-Language System using Open-Source LLMs for Gestures in Medical Interpreter Robots

Thanh-Tung Ngo, Emma Murphy, Robert J. Ross · 2026

Effective communication is vital in healthcare, especially across language barriers, where non-verbal cues and gestures are critical. This paper presents a privacy-preserving vision-language framework…

Read Paper →

Engineering Preprint PDF DOI

Autonomous Algorithm Discovery for Ptychography via Evolutionary LLM Reasoning

Xiangyu Yin, Ming Du, Junjing Deng, Zhi Yang, Yimo Han, Yi Jiang · 2026

Ptychography is a computational imaging technique widely used for high-resolution materials characterization, but high-quality reconstructions often require the use of regularization functions that la…

Read Paper →

Engineering Preprint PDF DOI

Relational Semantic Reasoning on 3D Scene Graphs for Open World Interactive Object Search

Imen Mahdi, Matteo Cassinelli, Fabien Despinoy, Tim Welschehold, Abhinav Valada · 2026

Open-world interactive object search in household environments requires understanding semantic relationships between objects and their surrounding context to guide exploration efficiently. Prior metho…

Read Paper →

Engineering Preprint PDF DOI

RACAS: Controlling Diverse Robots With a Single Agentic System

Dylan R. Ashley, Jan Przepiora, Yimeng Chen, Ali Abualsaud, Nurzhan Yesmagambet, Shinkyu Park, Eric Feron, Jurgen Schmidhuber · 2026

Many robotic platforms expose an API through which external software can command their actuators and read their sensors. However, transitioning from these low-level interfaces to high-level autonomous…

Read Paper →

Engineering Preprint PDF DOI

Observing and Controlling Features in Vision-Language-Action Models

Hugo Buurmeijer, Carmen Amo Alonso, Aiden Swann, Marco Pavone · 2026

Vision-Language-Action Models (VLAs) have shown remarkable progress towards embodied intelligence. While their architecture partially resembles that of Large Language Models (LLMs), VLAs exhibit highe…

Read Paper →

Engineering Preprint PDF DOI

PhysiFlow: Physics-Aware Humanoid Whole-Body VLA via Multi-Brain Latent Flow Matching and Robust Tracking

Weikai Qin, Sichen Wu, Ci Chen, Mengfan Liu, Linxi Feng, Xinru Cui, Haoqi Han, Hesheng Wang · 2026

In the domain of humanoid robot control, the fusion of Vision-Language-Action (VLA) with whole-body control is essential for semantically guided execution of real-world tasks. However, existing method…

Read Paper →

Engineering Preprint PDF DOI

PRISM: Personalized Refinement of Imitation Skills for Manipulation via Human Instructions

Arnau Boix-Granell, Alberto San-Miguel-Tello, Magi Dalmau-Moreno, Nestor Garcia · 2026

This paper presents PRISM: an instruction-conditioned refinement method for imitation policies in robotic manipulation. This approach bridges Imitation Learning (IL) and Reinforcement Learning (RL) fr…

Read Paper →

Engineering Preprint PDF DOI

OpenFrontier: General Navigation with Visual-Language Grounded Frontiers

Esteban Padilla, Boyang Sun, Marc Pollefeys, Hermann Blum · 2026

Open-world navigation requires robots to make decisions in complex everyday environments while adapting to flexible task requirements. Conventional navigation approaches often rely on dense 3D reconst…

Read Paper →

Engineering Preprint PDF DOI

Iterative On-Policy Refinement of Hierarchical Diffusion Policies for Language-Conditioned Manipulation

Clemence Grislain, Olivier Sigaud, Mohamed Chetouani · 2026

Hierarchical policies for language-conditioned manipulation decompose tasks into subgoals, where a high-level planner guides a low-level controller. However, these hierarchical agents often fail becau…

Read Paper →

Engineering Preprint PDF DOI

Critic in the Loop: A Tri-System VLA Framework for Robust Long-Horizon Manipulation

Pengfei Yi, Yingjie Ma, Wenjiang Xu, Yanan Hao, Shuai Gan, Wanting Li, Shanlin Zhong · 2026

Balancing high-level semantic reasoning with low-level reactive control remains a core challenge in visual robotic manipulation. While Vision-Language Models (VLMs) excel at cognitive planning, their …

Read Paper →

Engineering Preprint PDF DOI

Lifelong Language-Conditioned Robotic Manipulation Learning

Xudong Wang, Zebin Han, Zhiyu Liu, Gan Li, Jiahua Dong, Baichen Liu, Lianqing Liu, Zhi Han · 2026

Traditional language-conditioned manipulation agent sequential adaptation to new manipulation skills leads to catastrophic forgetting of old skills, limiting dynamic scene practical deployment. In thi…

Read Paper →

Browse Research Papers

StreamVoiceAnon+: Emotion-Preserving Streaming Speaker Anonymization via Frame-Level Acoustic Distillation

Lifelong Embodied Navigation Learning

Restoring Linguistic Grounding in VLA Models via Train-Free Attention Recalibration

HarvestFlex: Strawberry Harvesting via Vision-Language-Action Policy Adaptation in the Wild

X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs

AnyCamVLA: Zero-Shot Camera Adaptation for Viewpoint Robust Vision-Language-Action Models

STL-SVPIO: Signal Temporal Logic guided Stein Variational Path Integral Optimization

EmboAlign: Aligning Video Generation with Compositional Constraints for Zero-Shot Manipulation

Safe-Night VLA: Seeing the Unseen via Thermal-Perceptive Vision-Language-Action Models for Safety-Critical Manipulation

Vision-Language System using Open-Source LLMs for Gestures in Medical Interpreter Robots

Autonomous Algorithm Discovery for Ptychography via Evolutionary LLM Reasoning

Relational Semantic Reasoning on 3D Scene Graphs for Open World Interactive Object Search

RACAS: Controlling Diverse Robots With a Single Agentic System

Observing and Controlling Features in Vision-Language-Action Models

PhysiFlow: Physics-Aware Humanoid Whole-Body VLA via Multi-Brain Latent Flow Matching and Robust Tracking

PRISM: Personalized Refinement of Imitation Skills for Manipulation via Human Instructions

OpenFrontier: General Navigation with Visual-Language Grounded Frontiers

Iterative On-Policy Refinement of Hierarchical Diffusion Policies for Language-Conditioned Manipulation

Critic in the Loop: A Tri-System VLA Framework for Robust Long-Horizon Manipulation

Lifelong Language-Conditioned Robotic Manipulation Learning

Browse by Category

Research Type

Publish Your Research