Expertini Research Research

Browse Research Papers

9,775+ open-access research outputs.

โœ• Clear
๐Ÿ” programming languages ๐Ÿ“‚ Engineering
Showing 9775 results for "programming languages" in Engineering
Engineering Preprint PDF DOI

Calibration-Reasoning Framework for Descriptive Speech Quality Assessment

Elizaveta Kostenok, Mathieu Salzmann, Milos Cernak ยท 2026

Explainable speech quality assessment requires moving beyond Mean Opinion Scores (MOS) to analyze underlying perceptual dimensions. To address this, we introduce a novel post-training method that tailโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Cross-Hand Latent Representation for Vision-Language-Action Models

Guangqi Jiang, Yutong Liang, Jianglong Ye, Jia-Yang Huang, Changwei Jing, Rocky Duan, Pieter Abbeel, Xiaolong Wang, Xueyan Zou ยท 2026

Dexterous manipulation is essential for real-world robot autonomy, mirroring the central role of human hand coordination in daily activity. Humans rely on rich multimodal perception--vision, sound, anโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

AR-VLA: True Autoregressive Action Expert for Vision-Language-Action Models

Yutong Hu, Jan-Nico Zaech, Nikolay Nikolov, Yuanqi Yao, Sombit Dey, Giuliano Albanese, Renaud Detry, Luc Van Gool, Danda Paudel ยท 2026

We propose a standalone autoregressive (AR) Action Expert that generates actions as a continuous causal sequence while conditioning on refreshable vision-language prefixes. In contrast to existing Visโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

TiPToP: A Modular Open-Vocabulary Planning System for Robotic Manipulation

William Shen, Nishanth Kumar, Sahit Chintalapudi, Jie Wang, Christopher Watson, Edward Hu, Jing Cao, Dinesh Jayaraman, Leslie Pack Kaelbling, Tomas Lozano-Perez ยท 2026

We present TiPToP, an extensible modular system that combines pretrained vision foundation models with an existing Task and Motion Planner (TAMP) to solve multi-step manipulation tasks directly from iโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

BEACON: Language-Conditioned Navigation Affordance Prediction under Occlusion

Xinyu Gao, Gang Chen, Javier Alonso-Mora ยท 2026

Language-conditioned local navigation requires a robot to infer a nearby traversable target location from its current observation and an open-vocabulary, relational instruction. Existing vision-languaโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Let's Reward Step-by-Step: Step-Aware Contrastive Alignment for Vision-Language Navigation in Continuous Environments

Haoyuan Li, Rui Liu, Hehe Fan, Yi Yang ยท 2026

Vision-Language Navigation in Continuous Environments (VLN-CE) requires agents to learn complex reasoning from long-horizon human interactions. While Multi-modal Large Language Models (MLLMs) have driโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Finetuning a Text-to-Audio Model for Room Impulse Response Generation

Kirak Kim, Sungyoung Kim ยท 2026

Room Impulse Responses (RIRs) enable realistic acoustic simulation, with applications ranging from multimedia production to speech data augmentation. However, acquiring high-quality real-world RIRs isโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Speech-Omni-Lite: Portable Speech Interfaces for Vision-Language Models

Dehua Tao, Xuan Luo, Daxin Tan, Kai Chen, Lanqing Hong, Jing Li, Ruifeng Xu, Xiao Chen ยท 2026

While large-scale omni-models have demonstrated impressive capabilities across various modalities, their strong performance heavily relies on massive multimodal data and incurs substantial computationโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

NS-VLA: Towards Neuro-Symbolic Vision-Language-Action Models

Ziyue Zhu, Shangyang Wu, Shuai Zhao, Zhiqiu Zhao, Shengjie Li, Yi Wang, Fang Li, Haoran Luo ยท 2026

Vision-Language-Action (VLA) models are formulated to ground instructions in visual context and generate action sequences for robotic manipulation. Despite recent progress, VLA models still face challโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Beyond Short-Horizon: VQ-Memory for Robust Long-Horizon Manipulation in Non-Markovian Simulation Benchmarks

Honghui Wang, Zhi Jing, Jicong Ao, Shiji Song, Xuelong Li, Gao Huang, Chenjia Bai ยท 2026

The high cost of collecting real-robot data has made robotic simulation a scalable platform for both evaluation and data generation. Yet most existing benchmarks concentrate on simple manipulation tasโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

StyleVLA: Driving Style-Aware Vision Language Action Model for Autonomous Driving

Yuan Gao, Dengyuan Hua, Mattia Piccinini, Finn Rasmus Schafer, Korbinian Moller, Lin Li, Johannes Betz ยท 2026

Vision Language Models (VLMs) bridge visual perception and linguistic reasoning. In Autonomous Driving (AD), this synergy has enabled Vision Language Action (VLA) models, which translate high-level muโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

CORAL: Scalable Multi-Task Robot Learning via LoRA Experts

Yuankai Luo, Woping Chen, Tong Liang, Zhenguo Li ยท 2026

Deploying Vision-Language-Action (VLA) models in real-world robotics exposes a core multi-task learning challenge: reconciling task interference in multi-task robotic learning. When multiple tasks areโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

See, Plan, Rewind: Progress-Aware Vision-Language-Action Models for Robust Robotic Manipulation

Tingjun Dai, Mingfei Han, Tingwen Du, Zhiheng Liu, Zhihui Li, Salman Khan, Jun Yu, Xiaojun Chang ยท 2026

Measurement of task progress through explicit, actionable milestones is critical for robust robotic manipulation. This progress awareness enables a model to ground its current task status, anticipate โ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Acoustic and Semantic Modeling of Emotion in Spoken Language

Soumya Dutta ยท 2026

Emotions play a central role in human communication, shaping trust, engagement, and social interaction. As artificial intelligence systems powered by large language models become increasingly integratโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Optimization-Based Formation Flight on Libration Point Orbits

Yuri Shimane, Purnanand Elango, Avishai Weiss ยท 2026

A model predictive control (MPC) framework is developed for station-keeping in spacecraft formation flight along libration point orbits. At each control period, the MPC policy solves a multi-vehicle oโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

ZeroWBC: Learning Natural Visuomotor Humanoid Control Directly from Human Egocentric Video

Haoran Yang, Jiacheng Bao, Yucheng Xin, Haoming Song, Yuyang Tian, Bin Zhao, Dong Wang, Xuelong Li ยท 2026

Achieving versatile and naturalistic whole-body control for humanoid robot scene-interaction remains a significant challenge. While some recent works have demonstrated autonomous humanoid interactive โ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

SPAN-Nav: Generalized Spatial Awareness for Versatile Vision-Language Navigation

Jiahang Liu, Tianyu Xu, Jiawei Chen, Lu Yue, Jiazhao Zhang, Zhiyong Wang, Minghan Li, Qisheng Zhao, Anqi Li, Qi Su, Zhizheng Zhang, He Wang ยท 2026

Recent embodied navigation approaches leveraging Vision-Language Models (VLMs) demonstrate strong generalization in versatile Vision-Language Navigation (VLN). However, reliable path planning in complโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

DexHiL: A Human-in-the-Loop Framework for Vision-Language-Action Model Post-Training in Dexterous Manipulation

Yifan Han, Zhongxi Chen, Yuxuan Zhao, Congsheng Xu, Yanming Shao, Yichuan Peng, Yao Mu, Wenzhao Lian ยท 2026

While Vision-Language-Action (VLA) models have demonstrated promising generalization capabilities in robotic manipulation, deploying them on specific and complex downstream tasks still demands effectiโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

PM-Nav: Priori-Map Guided Embodied Navigation in Functional Buildings

Jiang Gao, Xiangyu Dong, Haozhou Li, Haoran Zhao, Yaoming Zhou, Xiaoguang Ma ยท 2026

Existing language-driven embodied navigation paradigms face challenges in functional buildings (FBs) with highly similar features, as they lack the ability to effectively utilize priori spatial knowleโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Latent World Models for Automated Driving: A Unified Taxonomy, Evaluation Framework, and Open Challenges

Rongxiang Zeng, Yongqi Dong ยท 2026

Emerging generative world models and vision-language-action (VLA) systems are rapidly reshaping automated driving by enabling scalable simulation, long-horizon forecasting, and capability-rich decisioโ€ฆ

Read Paper โ†’
โ† Prev Page 28 of 489 Next โ†’