Alejandro Ribeiro · Engineering · Preprint — Research Repository

Engineering Preprint PDF DOI

LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models

Hao Chen, Jiaming Liu, Zhonghao Yan, Nuowei Han, Renrui Zhang, Chenyang Gu, Jialin Gao, Ziyu Guo, Siyuan Qian, Yinxi Wang, Peng Jia, Chi-Wing Fu, Shanghang Zhang, Pheng-Ann Heng · 2026

Vision-Language-Action (VLA) models have increasingly incorporated reasoning mechanisms for complex robotic manipulation. However, existing approaches share a critical limitation: whether employing ex…

Read Paper →

Engineering Preprint PDF DOI

Privileged Foresight Distillation: Zero-Cost Future Correction for World Action Models

Pengcheng Fang, Hongli Chen, Xiaohao Cai · 2026

World action models jointly predict future video and action during training, raising an open question about what role the future-prediction branch actually plays. A recent finding shows that this bran…

Read Paper →

Engineering Preprint PDF DOI

Learning from the Best: Smoothness-Driven Metrics for Data Quality in Imitation Learning

Soham Kulkarni, Raayan Dhar, Yuchen Cui · 2026

In behavioral cloning (BC), policy performance is fundamentally limited by demonstration data quality. Real-world datasets contain trajectories of varying quality due to operator skill differences, te…

Read Paper →

Engineering Preprint PDF DOI

dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model

Yaxuan Li, Zhongyi Zhou, Yefei Chen, Yaokai Xue, Yichen Zhu · 2026

Evaluating robotics policies across thousands of environments and thousands of tasks is infeasible with existing approaches. This motivates the need for a new methodology for scalable robotics policy …

Read Paper →

Engineering Preprint PDF DOI

CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors

Dachong Li, ZhuangZhuang Chen, Jin Zhang, Jianqiang Li · 2026

Vision--Language--Action (VLA) models often use intermediate representations to connect multimodal inputs with continuous control, yet spatial guidance is often injected implicitly through latent feat…

Read Paper →

Engineering Preprint PDF DOI

PokeVLA: Empowering Pocket-Sized Vision-Language-Action Model with Comprehensive World Knowledge Guidance

Yupeng Zheng, Xiang Li, Songen Gu, Yuhang Zheng, Shuai Tian, Weize Li, Linbo Wang, Senyu Fei, Pengfei Li, Yinfeng Gao, Zebin Xing, Yilun Chen, Qichao Zhang, Haoran Li, Wenchao Ding · 2026

Recent advances in Vision-Language-Action (VLA) models have opened new avenues for robot manipulation, yet existing methods exhibit limited efficiency and a lack of high-level knowledge and spatial aw…

Read Paper →

Engineering Preprint PDF DOI

Mask World Model: Predicting What Matters for Robust Robot Policy Learning

Yunfan Lou, Xiaowei Chi, Xiaojie Zhang, Zezhong Qian, Chengxuan Li, Rongyu Zhang, Yaoxu Lyu, Guoyu Song, Chuyao Fu, Haoxuan Xu, Pengwei Wang, Shanghang Zhang · 2026

World models derived from large-scale video generative pre-training have emerged as a promising paradigm for generalist robot policy learning. However, standard approaches often focus on high-fidelity…

Read Paper →

Engineering Preprint PDF DOI

OFlow: Injecting Object-Aware Temporal Flow Matching for Robust Robotic Manipulation

Kuanning Wang, Ke Fan, Chenhao Qiu, Zeyu Shangguan, Yuqian Fu, Yanwei Fu, Daniel Seita, Xiangyang Xue · 2026

Robust robotic manipulation requires not only predicting how the scene evolves over time, but also recognizing task-relevant objects in complex scenes. However, existing VLA models face two limitation…

Read Paper →

Engineering Preprint PDF DOI

AnchorRefine: Synergy-Manipulation Based on Trajectory Anchor and Residual Refinement for Vision-Language-Action Models

Tingzheng Jia, Kan Guo, Lanping Qian, Yongli Hu, Daxin Tian, Guixian Qu, Chunmian Lin, Baocai Yin, Jiapu Wang · 2026

Precision-critical manipulation requires both global trajectory organization and local execution correction, yet most vision-language-action (VLA) policies generate actions within a single unified spa…

Read Paper →

Engineering Preprint PDF DOI

OmniVLA-RL: A Vision-Language-Action Model with Spatial Understanding and Online RL

Haoxiang Jie, Yaoyuan Yan, Xiangyu Wei, Kailin Wang, Hongjie Yan, Zhiyou Heng, Daocheng Chen · 2026

Visual-Language-Action (VLA) models represent a paradigm shift in embodied AI, yet existing frameworks often struggle with imprecise spatial perception, suboptimal multimodal fusion, and instability i…

Read Paper →

Engineering Preprint PDF DOI

StarVLA-$\alpha$: Reducing Complexity in Vision-Language-Action Systems

Jinhui Ye, Ning Gao, Senqiao Yang, Jinliang Zheng, Zixuan Wang, Yuxin Chen, Pengguang Chen, Yilun Chen, Shu Liu, Jiaya Jia · 2026

Vision-Language-Action (VLA) models have recently emerged as a promising paradigm for building general-purpose robotic agents. However, the VLA landscape remains highly fragmented and complex: as exis…

Read Paper →

Engineering Preprint PDF DOI

STRONG-VLA: Decoupled Robustness Learning for Vision-Language-Action Models under Multimodal Perturbations

Yuhan Xie, Yuping Yan, Yunqi Zhao, Handing Wang, Yaochu Jin · 2026

Despite their strong performance in embodied tasks, recent Vision-Language-Action (VLA) models remain highly fragile under multimodal perturbations, where visual corruption and linguistic noise jointl…

Read Paper →

Engineering Preprint PDF DOI

ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models

Nastaran Darabi, Amit Ranjan Trivedi · 2026

Vision language action (VLA) models enable generalist robotic agents but often exhibit language ignorance, relying on visual shortcuts and remaining insensitive to instruction changes. We present Pros…

Read Paper →

Engineering Preprint PDF DOI

A1: A Fully Transparent Open-Source, Adaptive and Efficient Truncated Vision-Language-Action Model

Kaidong Zhang, Jian Zhang, Rongtao Xu, Yu Sun, Shuoshuo Xue, Youpeng Wen, Xiaoyu Guo, Minghao Guo, Weijia Liufu, Liu Zihou, Kangyi Ji, Yangsong Zhang, Jiarun Zhu, Jingzhi Liu, Zihang Li, Ruiyi Chen, Meng Cao, Jingming Zhang, Shen Zhao, Xiaojun Chang, Feng Zheng, Ivan Laptev, Xiaodan Liang · 2026

Vision-Language-Action (VLA) models have emerged as a powerful paradigm for open-world robot manipulation, but their practical deployment is often constrained by cost: billion-scale VLM backbones and …

Read Paper →

Engineering Preprint PDF DOI

StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing

StarVLA Community · 2026

Building generalist embodied agents requires integrating perception, language understanding, and action, which are core capabilities addressed by Vision-Language-Action (VLA) approaches based on multi…

Read Paper →

Engineering Preprint PDF DOI

The Compression Gap: Why Discrete Tokenization Limits Vision-Language-Action Model Scaling

Takuya Shiba · 2026

Scaling Vision-Language-Action (VLA) models by upgrading the vision encoder is expected to improve downstream manipulation performance--as it does in vision-language modeling. We show that this expect…

Read Paper →

Engineering Preprint PDF DOI

CLaD: Planning with Grounded Foresight via Cross-Modal Latent Dynamics

Andrew Jeong, Jaemin Kim, Sebin Lee, Sung-Eui Yoon · 2026

Robotic manipulation involves kinematic and semantic transitions that are inherently coupled via underlying actions. However, existing approaches plan within either semantic or latent space without ex…

Read Paper →

Engineering Preprint PDF DOI

ProgressVLA: Progress-Guided Diffusion Policy for Vision-Language Robotic Manipulation

Hongyu Yan, Qiwei Li, Jiaolong Yang, Yadong Mu · 2026

Most existing vision-language-action (VLA) models for robotic manipulation lack progress awareness, typically relying on hand-crafted heuristics for task termination. This limitation is particularly s…

Read Paper →

Engineering Preprint PDF DOI

VLA-OPD: Bridging Offline SFT and Online RL for Vision-Language-Action Models via On-Policy Distillation

Zhide Zhong, Haodong Yan, Junfeng Li, Junjie He, Tianran Zhang, Haoang Li · 2026

Although pre-trained Vision-Language-Action (VLA) models exhibit impressive generalization in robotic manipulation, post-training remains crucial to ensure reliable performance during deployment. Howe…

Read Paper →

Engineering Preprint PDF DOI

DFM-VLA: Iterative Action Refinement for Robot Manipulation via Discrete Flow Matching

Jiayi Chen, Wenxuan Song, Shuai Chen, Jingbo Wang, Zhijun Li, Haoang Li · 2026

Vision--Language--Action (VLA) models that encode actions using a discrete tokenization scheme are increasingly adopted for robotic manipulation, but existing decoding paradigms remain fundamentally l…

Read Paper →

Browse Research Papers

LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models

Privileged Foresight Distillation: Zero-Cost Future Correction for World Action Models

Learning from the Best: Smoothness-Driven Metrics for Data Quality in Imitation Learning

dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model

CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors

PokeVLA: Empowering Pocket-Sized Vision-Language-Action Model with Comprehensive World Knowledge Guidance

Mask World Model: Predicting What Matters for Robust Robot Policy Learning

OFlow: Injecting Object-Aware Temporal Flow Matching for Robust Robotic Manipulation

AnchorRefine: Synergy-Manipulation Based on Trajectory Anchor and Residual Refinement for Vision-Language-Action Models

OmniVLA-RL: A Vision-Language-Action Model with Spatial Understanding and Online RL

StarVLA-$\alpha$: Reducing Complexity in Vision-Language-Action Systems

STRONG-VLA: Decoupled Robustness Learning for Vision-Language-Action Models under Multimodal Perturbations

ProGAL-VLA: Grounded Alignment through Prospective Reasoning in Vision-Language-Action Models

A1: A Fully Transparent Open-Source, Adaptive and Efficient Truncated Vision-Language-Action Model

StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing

The Compression Gap: Why Discrete Tokenization Limits Vision-Language-Action Model Scaling

CLaD: Planning with Grounded Foresight via Cross-Modal Latent Dynamics

ProgressVLA: Progress-Guided Diffusion Policy for Vision-Language Robotic Manipulation

VLA-OPD: Bridging Offline SFT and Online RL for Vision-Language-Action Models via On-Policy Distillation

DFM-VLA: Iterative Action Refinement for Robot Manipulation via Discrete Flow Matching

Browse by Category

Research Type

Publish Your Research