Programming Languages in Engineering — Research Repository

Engineering Preprint PDF DOI

Language-Free Generative Editing from One Visual Example

Omar Elezabi, Eduard Zamfir, Zongwei Wu, Radu Timofte · 2026

Text-guided diffusion models have advanced image editing by enabling intuitive control through language. However, despite their strong capabilities, we surprisingly find that SOTA methods struggle wit…

Read Paper →

Engineering Preprint PDF DOI

MMaDA-VLA: Large Diffusion Vision-Language-Action Model with Unified Multi-Modal Instruction and Generation

Yang Liu, Pengxiang Ding, Tengyue Jiang, Xudong Wang, Wenxuan Song, Minghui Lin, Han Zhao, Hongyin Zhang, Zifeng Zhuang, Wei Zhao, Siteng Huang, Jinkui Shi, Donglin Wang · 2026

Vision-Language-Action (VLA) models aim to control robots for manipulation from visual observations and natural-language instructions. However, existing hierarchical and autoregressive paradigms often…

Read Paper →

Engineering Preprint PDF DOI

XBRLTagRec: Domain-Specific Fine-Tuning and Zero-Shot Re-Ranking with LLMs for Extreme Financial Numeral Labeling

Gang Hu, Qun Zhang, Jingyao Luo, Yile Jiang, Jing Chai, Haiyan Ding · 2026

Publicly traded companies must disclose financial information under regulations of the Securities and Exchange Commission (SEC) and the Generally Accepted Accounting Principles (GAAP). The eXtensible …

Read Paper →

Engineering Preprint PDF DOI

Development of ML model for triboelectric nanogenerator based sign language detection system

Meshv Patel, Bikash Baro, Sayan Bayan, Mohendra Roy · 2026

Sign language recognition (SLR) is vital for bridging communication gaps between deaf and hearing communities. Vision-based approaches suffer from occlusion, computational costs, and physical constrai…

Read Paper →

Engineering Preprint PDF DOI

Large Language Models as Optimization Controllers: Adaptive Continuation for SIMP Topology Optimization

Shaoliang Yang, Jun Wang, Yunsheng Wang · 2026

We present a framework in which a large language model (LLM) acts as an online adaptive controller for SIMP topology optimization, replacing conventional fixed-schedule continuation with real-time, st…

Read Paper →

Engineering Preprint PDF DOI

ETA-VLA: Efficient Token Adaptation via Temporal Fusion and Intra-LLM Sparsification for Vision-Language-Action Models

Yiru Wang, Anqing Jiang, Shuo Wang, Yuwen Heng, Zichong Gu, Hao Sun · 2026

The integration of Vision-Language-Action (VLA) models into autonomous driving systems offers a unified framework for interpreting complex scenes and executing control commands. However, the necessity…

Read Paper →

Engineering Preprint PDF DOI

ThermoAct:Thermal-Aware Vision-Language-Action Models for Robotic Perception and Decision-Making

Young-Chae Son, Dae-Kwan Ko, Yoon-Ji Choi, Soo-Chul Lim · 2026

In recent human-robot collaboration environments, there is a growing focus on integrating diverse sensor data beyond visual information to enable safer and more intelligent task execution. Although th…

Read Paper →

Engineering Preprint PDF DOI

$\pi$, But Make It Fly: Physics-Guided Transfer of VLA Models to Aerial Manipulation

Johnathan Tucker, Denis Liu, Aiden Swann, Allen Ren, Javier Yu, Jiankai Sun, Brandon Kim, Lachlain McGranahan, Quan Vuong, Mac Schwager · 2026

Vision-Language-Action (VLA) models such as $\pi_0$ have demonstrated remarkable generalization across diverse fixed-base manipulators. However, transferring these foundation models to aerial platform…

Read Paper →

Engineering Preprint PDF DOI

Learning Rollout from Sampling:An R1-Style Tokenized Traffic Simulation Model

Ziyan Wang, Peng Chen, Ding Li, Chiwei Li, Qichao Zhang, Zhongpu Xia, Guizhen Yu · 2026

Learning diverse and high-fidelity traffic simulations from human driving demonstrations is crucial for autonomous driving evaluation. The recent next-token prediction (NTP) paradigm, widely adopted i…

Read Paper →

Engineering Preprint PDF DOI

SABER: A Stealthy Agentic Black-Box Attack Framework for Vision-Language-Action Models

Xiyang Wu, Guangyao Shi, Qingzi Wang, Zongxia Li, Amrit Singh Bedi, Dinesh Manocha · 2026

Vision-language-action (VLA) models enable robots to follow natural-language instructions grounded in visual observations, but the instruction channel also introduces a critical vulnerability: small t…

Read Paper →

Engineering Preprint PDF DOI

Cyber-Physical System Design Space Exploration for Affordable Precision Agriculture

Pawan Kumar, Hokeun Kim · 2026

Precision agriculture promises higher yields and sustainability, but adoption is slowed by the high cost of cyber-physical systems (CPS) and the lack of systematic design methods. We present a cost-aw…

Read Paper →

Engineering Preprint PDF DOI

3D-Mix for VLA: A Plug-and-Play Module for Integrating VGGT-based 3D Information into Vision-Language-Action Models

Bin Yu, Shijie Lian, Xiaopeng Lin, Zhaolong Shen, Yuliang Wei, Haishan Liu, Changti Wu, Hang Yuan, Bailing Wang, Cong Huang, Kai Chen · 2026

Vision-Language-Action (VLA) models leverage Multimodal Large Language Models (MLLMs) for robotic control, but recent studies reveal that MLLMs exhibit limited spatial intelligence due to training pre…

Read Paper →

Engineering Preprint PDF DOI

LATS: Large Language Model Assisted Teacher-Student Framework for Multi-Agent Reinforcement Learning in Traffic Signal Control

Yifeng Zhang, Peizhuo Li, Tingguang Zhou, Mingfeng Fan, Guillaume Sartoretti · 2026

Adaptive Traffic Signal Control (ATSC) aims to optimize traffic flow and minimize delays by adjusting traffic lights in real time. Recent advances in Multi-agent Reinforcement Learning (MARL) have sho…

Read Paper →

Engineering Preprint PDF DOI

Environment-Grounded Multi-Agent Workflow for Autonomous Penetration Testing

Michael Somma, Markus Gro{ss}pointner, Paul Zabalegui, Eppu Heilimo, Branka Stojanovic · 2026

The increasing complexity and interconnectivity of digital infrastructures make scalable and reliable security assessment methods essential. Robotic systems represent a particularly important class of…

Read Paper →

Engineering Preprint PDF DOI

How Open is Open TTS? A Practical Evaluation of Open Source TTS Tools

Teodora Ragman, Adrian Bogdan Stanea, Horia Cucu, Adriana Stan · 2026

Open-source text-to-speech (TTS) frameworks have emerged as highly adaptable platforms for developing speech synthesis systems across a wide range of languages. However, their applicability is not uni…

Read Paper →

Engineering Preprint PDF DOI

ReMemNav: A Rethinking and Memory-Augmented Framework for Zero-Shot Object Navigation

Feng Wu, Wei Zuo, Wenliang Yang, Jun Xiao, Yang Liu, Xinhua Zeng · 2026

Zero-shot object navigation requires agents to locate unseen target objects in unfamiliar environments without prior maps or task-specific training which remains a significant challenge. Although rece…

Read Paper →

Engineering Preprint PDF DOI

SOMA: Strategic Orchestration and Memory-Augmented System for Vision-Language-Action Model Robustness via In-Context Adaptation

Zhuoran Li, Zhiyang Li, Kaijun Zhou, Jinyu Gu · 2026

Despite the promise of Vision-Language-Action (VLA) models as generalist robotic controllers, their robustness against perceptual noise and environmental variations in out-of-distribution (OOD) tasks …

Read Paper →

Engineering Preprint PDF DOI

ACAVCaps: Enabling large-scale training for fine-grained and diverse audio understanding

Yadong Niu, Tianzi Wang, Heinrich Dinkel, Xingwei Sun, Jiahao Zhou, Gang Li, Jizhong Liu, Junbo Zhang, Jian Luan · 2026

General audio understanding is a fundamental goal for large audio-language models, with audio captioning serving as a cornerstone task for their development. However, progress in this domain is hinder…

Read Paper →

Engineering Preprint PDF DOI

QuadFM: Foundational Text-Driven Quadruped Motion Dataset for Generation and Control

Li Gao, Fuzhi Yang, Jianhui Chen, Liu Liu, Yao Zheng, Yang Cai, Ziqiao Li · 2026

Despite significant advances in quadrupedal robotics, a critical gap persists in foundational motion resources that holistically integrate diverse locomotion, emotionally expressive behaviors, and ric…

Read Paper →

Engineering Preprint PDF DOI

Event-Driven Proactive Assistive Manipulation with Grounded Vision-Language Planning

Fengkai Liu, Hao Su, Haozhuang Chi, Rui Geng, Congzhi Ren, Xuqing Liu, Yucheng Xu, Yuichi Ohsita, Liyun Zhang · 2026

Assistance in collaborative manipulation is often initiated by user instructions, making high-level reasoning request-driven. In fluent human teamwork, however, partners often infer the next helpful s…

Read Paper →

Browse Research Papers

Language-Free Generative Editing from One Visual Example

MMaDA-VLA: Large Diffusion Vision-Language-Action Model with Unified Multi-Modal Instruction and Generation

XBRLTagRec: Domain-Specific Fine-Tuning and Zero-Shot Re-Ranking with LLMs for Extreme Financial Numeral Labeling

Development of ML model for triboelectric nanogenerator based sign language detection system

Large Language Models as Optimization Controllers: Adaptive Continuation for SIMP Topology Optimization

ETA-VLA: Efficient Token Adaptation via Temporal Fusion and Intra-LLM Sparsification for Vision-Language-Action Models

ThermoAct:Thermal-Aware Vision-Language-Action Models for Robotic Perception and Decision-Making

$\pi$, But Make It Fly: Physics-Guided Transfer of VLA Models to Aerial Manipulation

Learning Rollout from Sampling:An R1-Style Tokenized Traffic Simulation Model

SABER: A Stealthy Agentic Black-Box Attack Framework for Vision-Language-Action Models

Cyber-Physical System Design Space Exploration for Affordable Precision Agriculture

3D-Mix for VLA: A Plug-and-Play Module for Integrating VGGT-based 3D Information into Vision-Language-Action Models

LATS: Large Language Model Assisted Teacher-Student Framework for Multi-Agent Reinforcement Learning in Traffic Signal Control

Environment-Grounded Multi-Agent Workflow for Autonomous Penetration Testing

How Open is Open TTS? A Practical Evaluation of Open Source TTS Tools

ReMemNav: A Rethinking and Memory-Augmented Framework for Zero-Shot Object Navigation

SOMA: Strategic Orchestration and Memory-Augmented System for Vision-Language-Action Model Robustness via In-Context Adaptation

ACAVCaps: Enabling large-scale training for fine-grained and diverse audio understanding

QuadFM: Foundational Text-Driven Quadruped Motion Dataset for Generation and Control

Event-Driven Proactive Assistive Manipulation with Grounded Vision-Language Planning

Browse by Category

Research Type

Publish Your Research