Felipe Belem in Engineering — Research Repository

Engineering Preprint PDF DOI

Flip Stunts on Bicycle Robots using Iterative Motion Imitation

Jeonghwan Kim, Shamel Fahmi, Seungeun Rho, Sehoon Ha, Gabriel Nelson · 2026

This work demonstrates a front-flip on bicycle robots via reinforcement learning, particularly by imitating reference motions that are infeasible and imperfect. To address this, we propose Iterative M…

Read Paper →

Engineering Preprint PDF DOI

OmniClone: Engineering a Robust, All-Rounder Whole-Body Humanoid Teleoperation System

Yixuan Li, Le Ma, Yutang Lin, Yushi Du, Mengya Liu, Kaizhe Hu, Jieming Cui, Yixin Zhu, Wei Liang, Baoxiong Jia, Siyuan Huang · 2026

Whole-body humanoid teleoperation enables humans to remotely control humanoid robots, serving as both a real-time operational tool and a scalable engine for collecting demonstrations for autonomous le…

Read Paper →

Engineering Preprint PDF DOI

$\Psi_0$: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation

Songlin Wei, Hongyi Jing, Boqian Li, Zhenyu Zhao, Jiageng Mao, Zhenhao Ni, Sicheng He, Jie Liu, Xiawei Liu, Kaidi Kang, Sheng Zang, Weiduo Yuan, Marco Pavone, Di Huang, Yue Wang · 2026

We introduce $\Psi_0$ (Psi-Zero), an open foundation model to address challenging humanoid loco-manipulation tasks. While existing approaches often attempt to address this fundamental problem by co-tr…

Read Paper →

Engineering Preprint PDF DOI

Thousand-GPU Large-Scale Training and Optimization Recipe for AI-Native Cloud Embodied Intelligence Infrastructure

Yongjian Guo, Yunxuan Ma, Haoran Sun, Zhong Guan, Shuai Di, Jing Long, Wanting Xu, Xiaodong Bai, Wen Huang, Yucheng Guo, Chen Zhou, Qiming Yang, Mingxi Luo, Tianyun Zhao, Hedan Yang, Song Wang, Xiaomeng Tian, Xiaolong Xiang, Zhen Sun, Yu Wei, Luqiao Wang, Yuzhen Li, Chenfeng Gu, Junwu Xiong, Yicheng Gong · 2026

Embodied intelligence is a key step towards Artificial General Intelligence (AGI), yet its development faces multiple challenges including data, frameworks, infrastructure, and evaluation systems. To …

Read Paper →

Engineering Preprint PDF DOI

MEM: Multi-Scale Embodied Memory for Vision Language Action Models

Marcel Torne, Karl Pertsch, Homer Walke, Kyle Vedder, Suraj Nair, Brian Ichter, Allen Z. Ren, Haohuan Wang, Jiaming Tang, Kyle Stachowicz, Karan Dhabalia, Michael Equi, Quan Vuong, Jost Tobias Springenberg, Sergey Levine, Chelsea Finn, Danny Driess · 2026

Conventionally, memory in end-to-end robotic learning involves inputting a sequence of past observations into the learned policy. However, in complex multi-stage real-world tasks, the robot's memory m…

Read Paper →

Engineering Preprint PDF DOI

Deep Accurate Solver for the Geodesic Problem

Saar Huberman, Amit Bracha, Ron Kimmel · 2026

A common approach to compute distances on continuous surfaces is by considering a discretized polygonal mesh approximating the surface and estimating distances on the polygon. We show that exact geode…

Read Paper →

Engineering Preprint PDF DOI

HALO: A Unified Vision-Language-Action Model for Embodied Multimodal Chain-of-Thought Reasoning

Quanxin Shou, Fangqi Zhu, Shawn Chen, Puxin Yan, Zhengyang Yan, Yikun Miao, Xiaoyi Pang, Zicong Hong, Ruikai Shi, Hao Huang, Jie Zhang, Song Guo · 2026

Vision-Language-Action (VLA) models have shown strong performance in robotic manipulation, but often struggle in long-horizon or out-of-distribution scenarios due to the lack of explicit mechanisms fo…

Read Paper →

Engineering Preprint PDF DOI

Scaling Law of Neural Koopman Operators

Abulikemu Abuduweili, Yuyang Pang, Feihan Li, Changliu Liu · 2026

Data-driven neural Koopman operator theory has emerged as a powerful tool for linearizing and controlling nonlinear robotic systems. However, the performance of these data-driven models fundamentally …

Read Paper →

Engineering Preprint PDF DOI

EgoScale: Scaling Dexterous Manipulation with Diverse Egocentric Human Data

Ruijie Zheng, Dantong Niu, Yuqi Xie, Jing Wang, Mengda Xu, Yunfan Jiang, Fernando Castaneda, Fengyuan Hu, You Liang Tan, Letian Fu, Trevor Darrell, Furong Huang, Yuke Zhu, Danfei Xu, Linxi Fan · 2026

Human behavior is among the most scalable sources of data for learning physical intelligence, yet how to effectively leverage it for dexterous manipulation remains unclear. While prior work demonstrat…

Read Paper →

Engineering Preprint PDF DOI

Discovering the mechanics of ultra-low density elastomeric foams in elite-level racing shoes

Jeremy A. McCulloch, Scott L. Delp, Ellen Kuhl · 2026

Ultra-low-density elastomeric foams enable lightweight systems that combine high compliance with efficient energy return. In high-performance racing shoes, these foams are critical for low weight, hig…

Read Paper →

Engineering Preprint PDF DOI

Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution

Rui Cai, Jun Guo, Xinze He, Piaopiao Jin, Jie Li, Bingxuan Lin, Futeng Liu, Wei Liu, Fei Ma, Kun Ma, Feng Qiu, Heng Qu, Yifei Su, Qiao Sun, Dong Wang, Donghao Wang, Yunhong Wang, Rujie Wu, Diyun Xiang, Yu Yang, Hangjun Ye, Yuan Zhang, Quanyun Zhou · 2026

In this report, we introduce Xiaomi-Robotics-0, an advanced vision-language-action (VLA) model optimized for high performance and fast and smooth real-time execution. The key to our method lies in a c…

Read Paper →

Engineering Preprint PDF DOI

LAP: Language-Action Pre-Training Enables Zero-shot Cross-Embodiment Transfer

Lihan Zha, Asher J. Hancock, Mingtong Zhang, Tenny Yin, Yixuan Huang, Dhruv Shah, Allen Z. Ren, Anirudha Majumdar · 2026

A long-standing goal in robotics is a generalist policy that can be deployed zero-shot on new robot embodiments without per-embodiment adaptation. Despite large-scale multi-embodiment pre-training, ex…

Read Paper →

Engineering Preprint PDF DOI

VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model

Jingwen Sun, Wenyao Zhang, Zekun Qi, Shaojie Ren, Zezhi Liu, Hanxin Zhu, Guangzhong Sun, Xin Jin, Zhibo Chen · 2026

Pretraining Vision-Language-Action (VLA) policies on internet-scale video is appealing, yet current latent-action objectives often learn the wrong thing: they remain anchored to pixel variation rather…

Read Paper →

Engineering Preprint PDF DOI

Rethinking Visual-Language-Action Model Scaling: Alignment, Mixture, and Regularization

Ye Wang, Sipeng Zheng, Hao Luo, Wanpeng Zhang, Haoqi Yuan, Chaoyi Xu, Haiweng Xu, Yicheng Feng, Mingyang Yu, Zhiyu Kang, Zongqing Lu, Qin Jin · 2026

While Vision-Language-Action (VLA) models show strong promise for generalist robot control, it remains unclear whether -- and under what conditions -- the standard "scale data" recipe translates to ro…

Read Paper →

Engineering Preprint PDF DOI

A Modern System Recipe for Situated Embodied Human-Robot Conversation with Real-Time Multimodal LLMs and Tool-Calling

Dong Won Lee, Sarah Gillet, Louis-Philippe Morency, Cynthia Breazeal, Hae Won Park · 2026

Situated embodied conversation requires robots to interleave real-time dialogue with active perception: deciding what to look at, when to look, and what to say under tight latency constraints. We pres…

Read Paper →

Engineering Preprint PDF DOI

RDT2: Exploring the Scaling Limit of UMI Data Towards Zero-Shot Cross-Embodiment Generalization

Songming Liu, Bangguo Li, Kai Ma, Lingxuan Wu, Hengkai Tan, Xiao Ouyang, Hang Su, Jun Zhu · 2026

Vision-Language-Action (VLA) models hold promise for generalist robotics but currently struggle with data scarcity, architectural inefficiencies, and the inability to generalize across different hardw…

Read Paper →

Engineering Preprint PDF DOI

Generative Artificial Intelligence creates delicious, sustainable, and nutritious burgers

Vahidullah Tac, Christopher Gardner, Ellen Kuhl · 2026

Food choices shape both human and planetary health; yet, designing foods that are delicious, nutritious, and sustainable remains challenging. Here we show that generative artificial intelligence can l…

Read Paper →

Engineering Preprint PDF DOI

Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization

Hao Luo, Ye Wang, Wanpeng Zhang, Sipeng Zheng, Ziheng Xi, Chaoyi Xu, Haiweng Xu, Haoqi Yuan, Chi Zhang, Yiqing Wang, Yicheng Feng, Zongqing Lu · 2026

We introduce Being-H0.5, a foundational Vision-Language-Action (VLA) model designed for robust cross-embodiment generalization across diverse robotic platforms. While existing VLAs often struggle with…

Read Paper →

Engineering Preprint PDF DOI

A Generalizable Framework for Building Executable Domain-Specific LLMs under Data Scarcity: Demonstration on Semiconductor TCAD Simulation

Di Wang, Zhenhua Wu, Yu Liu, Kai Chang, Shaohua Wu · 2026

Scientific and engineering verticals often suffer from data scarcity and strict executability requirements: models must generate not only fluent text, but also syntactically valid, tool-compilable scr…

Read Paper →

Engineering Preprint PDF DOI

CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion

Ralf Romer, Yi Zhang, Angela P. Schoellig · 2026

To teach robots complex manipulation tasks, it is now a common practice to fine-tune a pre-trained vision-language-action model (VLA) on task-specific data. However, since this recipe updates existing…

Read Paper →

Browse Research Papers

Flip Stunts on Bicycle Robots using Iterative Motion Imitation

OmniClone: Engineering a Robust, All-Rounder Whole-Body Humanoid Teleoperation System

$\Psi_0$: An Open Foundation Model Towards Universal Humanoid Loco-Manipulation

Thousand-GPU Large-Scale Training and Optimization Recipe for AI-Native Cloud Embodied Intelligence Infrastructure

MEM: Multi-Scale Embodied Memory for Vision Language Action Models

Deep Accurate Solver for the Geodesic Problem

HALO: A Unified Vision-Language-Action Model for Embodied Multimodal Chain-of-Thought Reasoning

Scaling Law of Neural Koopman Operators

EgoScale: Scaling Dexterous Manipulation with Diverse Egocentric Human Data

Discovering the mechanics of ultra-low density elastomeric foams in elite-level racing shoes

Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution

LAP: Language-Action Pre-Training Enables Zero-shot Cross-Embodiment Transfer

VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model

Rethinking Visual-Language-Action Model Scaling: Alignment, Mixture, and Regularization

A Modern System Recipe for Situated Embodied Human-Robot Conversation with Real-Time Multimodal LLMs and Tool-Calling

RDT2: Exploring the Scaling Limit of UMI Data Towards Zero-Shot Cross-Embodiment Generalization

Generative Artificial Intelligence creates delicious, sustainable, and nutritious burgers

Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization

A Generalizable Framework for Building Executable Domain-Specific LLMs under Data Scarcity: Demonstration on Semiconductor TCAD Simulation

CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion

Browse by Category

Research Type

Publish Your Research