Michael Malocha in Engineering — Research Repository

Engineering Preprint PDF DOI

How Open is Open TTS? A Practical Evaluation of Open Source TTS Tools

Teodora Ragman, Adrian Bogdan Stanea, Horia Cucu, Adriana Stan · 2026

Open-source text-to-speech (TTS) frameworks have emerged as highly adaptable platforms for developing speech synthesis systems across a wide range of languages. However, their applicability is not uni…

Read Paper →

Engineering Preprint PDF DOI

Multiview Progress Prediction of Robot Activities

Elena Zoppellari, Federico Becattini, Marco Fiorucci, Lamberto Ballan · 2026

For robots to operate effectively and safely alongside humans, they must be able to understand the progress of ongoing actions. This ability, known as action progress prediction, is critical for tasks…

Read Paper →

Engineering Preprint PDF DOI

Benchmarking Affordance Generalization with BusyBox

Dean Fortier, Timothy Adamson, Tess Hellebrekers, Teresa LaScala, Kofi Ennin, Michael Murray, Andrey Kolobov, Galen Mullins · 2026

Vision-Language-Action (VLA) models have been attracting the attention of researchers and practitioners thanks to their promise of generalization. Although single-task policies still offer competitive…

Read Paper →

Engineering Preprint PDF DOI

Streaming Speech Recognition with Decoder-Only Large Language Models and Latency Optimization

Genshun Wan, Wenhui Zhang, Jing-Xuan Zhang, Shifu Xiong, Jianqing Gao, Zhongfu Ye · 2026

Recent advances have demonstrated the potential of decoderonly large language models (LLMs) for automatic speech recognition (ASR). However, enabling streaming recognition within this framework remain…

Read Paper →

Engineering Preprint PDF DOI

Coverage Performance Analysis of FAS-enhanced LoRa Wide Area Networks under both Co-SF and Inter-SF Interference

Gaoze Mu, Yanzhao Hou, Mingjie Chen, Yuanyu Hu, Yongan Zheng, Qimei Cui, Xiaofeng Tao · 2026

This paper presents an analytical framework for evaluating the coverage performance of the fluid antenna system (FAS)-enhanced LoRa wide-area networks (LoRaWANs). We investigate the effects of large-s…

Read Paper →

Engineering Preprint PDF DOI

LoLA: Long Horizon Latent Action Learning for General Robot Manipulation

Xiaofan Wang, Xingyu Gao, Jianlong Fu, Zuolei Li, Dean Fortier, Galen Mullins, Andrey Kolobov, Baining Guo · 2025

The capability of performing long-horizon, language-guided robotic manipulation tasks critically relies on leveraging historical information and generating coherent action sequences. However, such cap…

Read Paper →

Engineering Preprint PDF DOI

Experiences from Benchmarking Vision-Language-Action Models for Robotic Manipulation

Yihao Zhang, Yuankai Qi, Xi Zheng · 2025

Foundation models applied in robotics, particularly \textbf{Vision--Language--Action (VLA)} models, hold great promise for achieving general-purpose manipulation. Yet, systematic real-world evaluation…

Read Paper →

Engineering Preprint PDF DOI

Uplink SCMA-empowered Uncoordinated Random Access for Future mMTC

Pengyu Gao, Qu Luo, Jing Zhu, Gaojie Chen, Pei Xiao, Chuan Heng Foh · 2025

In this paper, a novel uncoordinated random access (URA) protocol is presented to address the pressing demand for massive connectivity with low access latency in future massive machine type communicat…

Read Paper →

Engineering Preprint PDF DOI

ALOHA2 Robot Kitchen Application Scenario Reproduction Report

Haoyang Wu, Siheng Wu, William X. Liu, Fangui Zeng · 2025

ALOHA2 is an enhanced version of the dual-arm teleoperated robot ALOHA, featuring higher performance and robustness compared to the original design, while also being more ergonomic. Like ALOHA, ALOHA2…

Read Paper →

Engineering Preprint PDF DOI

EMMA: Scaling Mobile Manipulation via Egocentric Human Data

Lawrence Y. Zhu, Pranav Kuppili, Ryan Punamiya, Patcharapong Aphiwetsa, Dhruv Patel, Simar Kareer, Sehoon Ha, Danfei Xu · 2025

Scaling mobile manipulation imitation learning is bottlenecked by expensive mobile robot teleoperation. We present Egocentric Mobile MAnipulation (EMMA), an end-to-end framework training mobile manipu…

Read Paper →

Engineering Preprint PDF DOI

Improving Generalization Ability of Robotic Imitation Learning by Resolving Causal Confusion in Observations

Yifei Chen, Yuzhe Zhang, Giovanni D'urso, Nicholas Lawrance, Brendan Tidd · 2025

Recent developments in imitation learning have considerably advanced robotic manipulation. However, current techniques in imitation learning can suffer from poor generalization, limiting performance e…

Read Paper →

Engineering Preprint PDF DOI

Look, Focus, Act: Efficient and Robust Robot Learning via Human Gaze and Foveated Vision Transformers

Ian Chuang, Jinyu Zou, Andrew Lee, Dechen Gao, Iman Soltani · 2025

Human vision is a highly active process driven by gaze, which directs attention to task-relevant regions through foveation, dramatically reducing visual processing. In contrast, robot learning systems…

Read Paper →

Engineering Preprint PDF DOI

Diffusion-Based Imaginative Coordination for Bimanual Manipulation

Huilin Xu, Jian Ding, Jiakun Xu, Ruixiang Wang, Jun Chen, Jinjie Mai, Yanwei Fu, Bernard Ghanem, Feng Xu, Mohamed Elhoseiny · 2025

Bimanual manipulation is crucial in robotics, enabling complex tasks in industrial automation and household services. However, it poses significant challenges due to the high-dimensional action space …

Read Paper →

Engineering Preprint PDF DOI

Pronunciation Editing for Finnish Speech using Phonetic Posteriorgrams

Zirui Li, Lauri Juvela, Mikko Kurimo · 2025

Synthesizing second-language (L2) speech is potentially highly valued for L2 language learning experience and feedback. However, due to the lack of L2 speech synthesis datasets, it is difficult to syn…

Read Paper →

Engineering Preprint PDF DOI

I Know You're Listening: Adaptive Voice for HRI

Paige Tuttosi · 2025

While the use of social robots for language teaching has been explored, there remains limited work on a task-specific synthesized voices for language teaching robots. Given that language is a verbal t…

Read Paper →

Engineering Preprint PDF DOI

EmojiVoice: Towards long-term controllable expressivity in robot speech

Paige Tuttosi, Shivam Mehta, Zachary Syvenky, Bermet Burkanova, Gustav Eje Henter, Angelica Lim · 2025

Humans vary their expressivity when speaking for extended periods to maintain engagement with their listener. Although social robots tend to be deployed with ``expressive'' joyful voices, they lack th…

Read Paper →

Engineering Preprint PDF DOI

SGN-CIRL: Scene Graph-based Navigation with Curriculum, Imitation, and Reinforcement Learning

Nikita Oskolkov, Huzhenyu Zhang, Dmitry Makarov, Dmitry Yudin, Aleksandr Panov · 2025

The 3D scene graph models spatial relationships between objects, enabling the agent to efficiently navigate in a partially observable environment and predict the location of the target object.This pap…

Read Paper →

Engineering Preprint PDF DOI

Splatting Physical Scenes: End-to-End Real-to-Sim from Imperfect Robot Data

Ben Moran, Mauro Comi, Arunkumar Byravan, Steven Bohez, Tom Erez, Zhibin Li, Leonard Hasenclever · 2025

Creating accurate, physical simulations directly from real-world robot motion holds great value for safe, scalable, and affordable robot learning, yet remains exceptionally challenging. Real robot dat…

Read Paper →

Engineering Preprint PDF DOI

Federated Learning-Distillation Alternation for Resource-Constrained IoT

Rafael Valente da Silva, Onel L. Alcaraz Lopez, Richard Demo Souza · 2025

Federated learning (FL) faces significant challenges in Internet of Things (IoT) networks due to device limitations in energy and communication resources, especially when considering the large size of…

Read Paper →

Engineering Preprint PDF DOI

GPA-RAM: Grasp-Pretraining Augmented Robotic Attention Mamba for Spatial Task Learning

Juyi Sheng, Yangjun Liu, Sheng Xu, Zhixin Yang, Mengyuan Liu · 2025

Task failures in prior fine-grained robotic manipulation methods often stem from suboptimal initial grasping, which is critical for subsequent manipulation and reducing the requirement for complex pos…

Read Paper →

Browse Research Papers

How Open is Open TTS? A Practical Evaluation of Open Source TTS Tools

Multiview Progress Prediction of Robot Activities

Benchmarking Affordance Generalization with BusyBox

Streaming Speech Recognition with Decoder-Only Large Language Models and Latency Optimization

Coverage Performance Analysis of FAS-enhanced LoRa Wide Area Networks under both Co-SF and Inter-SF Interference

LoLA: Long Horizon Latent Action Learning for General Robot Manipulation

Experiences from Benchmarking Vision-Language-Action Models for Robotic Manipulation

Uplink SCMA-empowered Uncoordinated Random Access for Future mMTC

ALOHA2 Robot Kitchen Application Scenario Reproduction Report

EMMA: Scaling Mobile Manipulation via Egocentric Human Data

Improving Generalization Ability of Robotic Imitation Learning by Resolving Causal Confusion in Observations

Look, Focus, Act: Efficient and Robust Robot Learning via Human Gaze and Foveated Vision Transformers

Diffusion-Based Imaginative Coordination for Bimanual Manipulation

Pronunciation Editing for Finnish Speech using Phonetic Posteriorgrams

I Know You're Listening: Adaptive Voice for HRI

EmojiVoice: Towards long-term controllable expressivity in robot speech

SGN-CIRL: Scene Graph-based Navigation with Curriculum, Imitation, and Reinforcement Learning

Splatting Physical Scenes: End-to-End Real-to-Sim from Imperfect Robot Data

Federated Learning-Distillation Alternation for Resource-Constrained IoT

GPA-RAM: Grasp-Pretraining Augmented Robotic Attention Mamba for Spatial Task Learning

Browse by Category

Research Type

Publish Your Research