Alejandro Ribeiro — Research Repository

Engineering Preprint PDF DOI

LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models

Hao Chen, Jiaming Liu, Zhonghao Yan, Nuowei Han, Renrui Zhang, Chenyang Gu, Jialin Gao, Ziyu Guo, Siyuan Qian, Yinxi Wang, Peng Jia, Chi-Wing Fu, Shanghang Zhang, Pheng-Ann Heng · 2026

Vision-Language-Action (VLA) models have increasingly incorporated reasoning mechanisms for complex robotic manipulation. However, existing approaches share a critical limitation: whether employing ex…

Read Paper →

AI & Data Science Preprint PDF DOI

PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations

Yang Zhang, Jiangyuan Zhao, Chenyou Fan, Fangzheng Yan, Tian Li, Haitong Tang, Sen Fu, Xuan'er Wu, Qizhen Weng, Weinan Zhang, Xiu Li, Chi Zhang, Chenjia Bai, Xuelong Li · 2026

Vision-Language-Action (VLA) models advance robotic control via strong visual-linguistic priors. However, existing VLAs predominantly frame pretraining as supervised behavior cloning, overlooking the …

Read Paper →

Mathematics Preprint PDF DOI

On the monotonicity of affine quermassintegrals

Shibing Chen, Yuanyuan Li, Xianduo Wang · 2026

Lutwak's affine quermassintegral theory is a foundational component of modern affine Brunn--Minkowski theory. Developed in the 1980s, it provides affine analogues of the classical quermassintegrals an…

Read Paper →

Computer Science Preprint PDF DOI

Application-Aware Twin-in-the-Loop Planning for Federated Split Learning over Wireless Edge Networks

Zihao Ding, Beining Wu, Jun Huang, Shiwen Mao · 2026

We investigate task-success-oriented resource allocation for federated split learning (FSL) at the wireless edge. In this setting, the server must jointly determine bandwidth, transmit power, split-la…

Read Paper →

Engineering Preprint PDF DOI

Privileged Foresight Distillation: Zero-Cost Future Correction for World Action Models

Pengcheng Fang, Hongli Chen, Xiaohao Cai · 2026

World action models jointly predict future video and action during training, raising an open question about what role the future-prediction branch actually plays. A recent finding shows that this bran…

Read Paper →

AI & Data Science Preprint PDF DOI

CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies

Fan Du, Feng Yan, Jianxiong Wu, Xinrun Xu, Weiye Zhang, Weinong Wang, Yu Guo, Bin Qian, Zhihai He, Fei Wang, Heng Yang · 2026

Flow-based vision-language-action (VLA) policies offer strong expressivity for action generation, but suffer from a fundamental inefficiency: multi-step inference is required to recover action structu…

Read Paper →

Mathematics Preprint PDF DOI

PRP, HS and LS Conjugate Gradient Methods for Interval-Valued Multiobjective Optimization Problems

Tapas Mondal, Debdulal Ghosh, Zai-Yun Peng, Yong Zhao · 2026

In this article, we develop an efficient algorithm based on three special variants of the nonlinear conjugate gradient method, namely, the Polak--Ribiere--Polyak, Hestenes--Stiefel, and Liu--Story sch…

Read Paper →

Engineering Preprint PDF DOI

Learning from the Best: Smoothness-Driven Metrics for Data Quality in Imitation Learning

Soham Kulkarni, Raayan Dhar, Yuchen Cui · 2026

In behavioral cloning (BC), policy performance is fundamentally limited by demonstration data quality. Real-world datasets contain trajectories of varying quality due to operator skill differences, te…

Read Paper →

Engineering Preprint PDF DOI

dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model

Yaxuan Li, Zhongyi Zhou, Yefei Chen, Yaokai Xue, Yichen Zhu · 2026

Evaluating robotics policies across thousands of environments and thousands of tasks is infeasible with existing approaches. This motivates the need for a new methodology for scalable robotics policy …

Read Paper →

Engineering Preprint PDF DOI

CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors

Dachong Li, ZhuangZhuang Chen, Jin Zhang, Jianqiang Li · 2026

Vision--Language--Action (VLA) models often use intermediate representations to connect multimodal inputs with continuous control, yet spatial guidance is often injected implicitly through latent feat…

Read Paper →

Engineering Preprint PDF DOI

PokeVLA: Empowering Pocket-Sized Vision-Language-Action Model with Comprehensive World Knowledge Guidance

Yupeng Zheng, Xiang Li, Songen Gu, Yuhang Zheng, Shuai Tian, Weize Li, Linbo Wang, Senyu Fei, Pengfei Li, Yinfeng Gao, Zebin Xing, Yilun Chen, Qichao Zhang, Haoran Li, Wenchao Ding · 2026

Recent advances in Vision-Language-Action (VLA) models have opened new avenues for robot manipulation, yet existing methods exhibit limited efficiency and a lack of high-level knowledge and spatial aw…

Read Paper →

Engineering Preprint PDF DOI

Mask World Model: Predicting What Matters for Robust Robot Policy Learning

Yunfan Lou, Xiaowei Chi, Xiaojie Zhang, Zezhong Qian, Chengxuan Li, Rongyu Zhang, Yaoxu Lyu, Guoyu Song, Chuyao Fu, Haoxuan Xu, Pengwei Wang, Shanghang Zhang · 2026

World models derived from large-scale video generative pre-training have emerged as a promising paradigm for generalist robot policy learning. However, standard approaches often focus on high-fidelity…

Read Paper →

AI & Data Science Preprint PDF DOI

HELM: Harness-Enhanced Long-horizon Memory for Vision-Language-Action Manipulation

Zijian Zeng, Fei Ding, Huiming Yang, Xianwei Li · 2026

Vision-Language-Action (VLA) models fail systematically on long-horizon manipulation tasks despite strong short-horizon performance. We show that this failure is not resolved by extending context leng…

Read Paper →

AI & Data Science Preprint PDF DOI

Test-Time Perturbation Learning with Delayed Feedback for Vision-Language-Action Models

Zehua Zang, Xi Wang, Fuchun Sun, Xiao Xu, Lixiang Lium, Jiahuan Zhou, Jiangmeng Li · 2026

Vision-Language-Action models (VLAs) achieve remarkable performance in sequential decision-making but remain fragile to subtle environmental shifts, such as small changes in object pose. We attribute …

Read Paper →

Engineering Preprint PDF DOI

OFlow: Injecting Object-Aware Temporal Flow Matching for Robust Robotic Manipulation

Kuanning Wang, Ke Fan, Chenhao Qiu, Zeyu Shangguan, Yuqian Fu, Yanwei Fu, Daniel Seita, Xiangyang Xue · 2026

Robust robotic manipulation requires not only predicting how the scene evolves over time, but also recognizing task-relevant objects in complex scenes. However, existing VLA models face two limitation…

Read Paper →

Engineering Preprint PDF DOI

AnchorRefine: Synergy-Manipulation Based on Trajectory Anchor and Residual Refinement for Vision-Language-Action Models

Tingzheng Jia, Kan Guo, Lanping Qian, Yongli Hu, Daxin Tian, Guixian Qu, Chunmian Lin, Baocai Yin, Jiapu Wang · 2026

Precision-critical manipulation requires both global trajectory organization and local execution correction, yet most vision-language-action (VLA) policies generate actions within a single unified spa…

Read Paper →

Engineering Preprint PDF DOI

OmniVLA-RL: A Vision-Language-Action Model with Spatial Understanding and Online RL

Haoxiang Jie, Yaoyuan Yan, Xiangyu Wei, Kailin Wang, Hongjie Yan, Zhiyou Heng, Daocheng Chen · 2026

Visual-Language-Action (VLA) models represent a paradigm shift in embodied AI, yet existing frameworks often struggle with imprecise spatial perception, suboptimal multimodal fusion, and instability i…

Read Paper →

Engineering Preprint PDF DOI

StarVLA-$\alpha$: Reducing Complexity in Vision-Language-Action Systems

Jinhui Ye, Ning Gao, Senqiao Yang, Jinliang Zheng, Zixuan Wang, Yuxin Chen, Pengguang Chen, Yilun Chen, Shu Liu, Jiaya Jia · 2026

Vision-Language-Action (VLA) models have recently emerged as a promising paradigm for building general-purpose robotic agents. However, the VLA landscape remains highly fragmented and complex: as exis…

Read Paper →

Mathematics Preprint PDF DOI

Faces of invariant convex sets in representations of nontrivial copolarity

Yi Shi · 2026

Let $(V, G)$ be an orthogonal representation of a compact Lie group $G$ with nontrivial copolarity, and $\Sigma$ a fat section of $(V, G)$. If $E$ is a $G$-invariant compact convex set in $V$, then $P…

Read Paper →

Engineering Preprint PDF DOI

STRONG-VLA: Decoupled Robustness Learning for Vision-Language-Action Models under Multimodal Perturbations

Yuhan Xie, Yuping Yan, Yunqi Zhao, Handing Wang, Yaochu Jin · 2026

Despite their strong performance in embodied tasks, recent Vision-Language-Action (VLA) models remain highly fragile under multimodal perturbations, where visual corruption and linguistic noise jointl…

Read Paper →

Browse Research Papers

LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models

PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations

On the monotonicity of affine quermassintegrals

Application-Aware Twin-in-the-Loop Planning for Federated Split Learning over Wireless Edge Networks

Privileged Foresight Distillation: Zero-Cost Future Correction for World Action Models

CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies

PRP, HS and LS Conjugate Gradient Methods for Interval-Valued Multiobjective Optimization Problems

Learning from the Best: Smoothness-Driven Metrics for Data Quality in Imitation Learning

dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model

CorridorVLA: Explicit Spatial Constraints for Generative Action Heads via Sparse Anchors

PokeVLA: Empowering Pocket-Sized Vision-Language-Action Model with Comprehensive World Knowledge Guidance

Mask World Model: Predicting What Matters for Robust Robot Policy Learning

HELM: Harness-Enhanced Long-horizon Memory for Vision-Language-Action Manipulation

Test-Time Perturbation Learning with Delayed Feedback for Vision-Language-Action Models

OFlow: Injecting Object-Aware Temporal Flow Matching for Robust Robotic Manipulation

AnchorRefine: Synergy-Manipulation Based on Trajectory Anchor and Residual Refinement for Vision-Language-Action Models

OmniVLA-RL: A Vision-Language-Action Model with Spatial Understanding and Online RL

StarVLA-$\alpha$: Reducing Complexity in Vision-Language-Action Systems

Faces of invariant convex sets in representations of nontrivial copolarity

STRONG-VLA: Decoupled Robustness Learning for Vision-Language-Action Models under Multimodal Perturbations

Browse by Category

Research Type

Publish Your Research