Mateja Dumbovic in Engineering — Research Repository

Engineering Preprint PDF DOI

MALEFA: Multi-grAnularity Learning and Effective False Alarm Suppression for Zero-shot Keyword Spotting

Lo-Ya Li, Tien-Hong Lo, Jeih-Weih Hung, Shih-Chieh Huang, Berlin Chen · 2026

User-defined keyword spotting (KWS) without resorting to domain-specific pre-labeled training data is of fundamental importance in building adaptable and personalized voice interfaces. However, such s…

Read Paper →

Engineering Preprint PDF DOI

How Open is Open TTS? A Practical Evaluation of Open Source TTS Tools

Teodora Ragman, Adrian Bogdan Stanea, Horia Cucu, Adriana Stan · 2026

Open-source text-to-speech (TTS) frameworks have emerged as highly adaptable platforms for developing speech synthesis systems across a wide range of languages. However, their applicability is not uni…

Read Paper →

Engineering Preprint PDF DOI

Towards Scalable Probabilistic Human Motion Prediction with Gaussian Processes for Safe Human-Robot Collaboration

Jinger Chong, Xiaotong Zhang, Kamal Youcef-Toumi · 2026

Accurate human motion prediction with well-calibrated uncertainty is critical for safe human-robot collaboration (HRC), where robots must anticipate and react to human movements in real time. We propo…

Read Paper →

Engineering Preprint PDF DOI

MATE: Matryoshka Audio-Text Embeddings for Open-Vocabulary Keyword Spotting

Youngmoon Jung, Myunghun Jung, Joon-Young Yang, Yong-Hyeok Lee, Jaeyoung Roh, Hoon-Young Cho · 2026

Open-vocabulary keyword spotting (KWS) with text-based enrollment has emerged as a flexible alternative to fixed-phrase triggers. Prior utterance-level matching methods, from an embedding-learning sta…

Read Paper →

Engineering Preprint PDF DOI

Pronunciation Editing for Finnish Speech using Phonetic Posteriorgrams

Zirui Li, Lauri Juvela, Mikko Kurimo · 2025

Synthesizing second-language (L2) speech is potentially highly valued for L2 language learning experience and feedback. However, due to the lack of L2 speech synthesis datasets, it is difficult to syn…

Read Paper →

Engineering Preprint PDF DOI

MATER: Multi-level Acoustic and Textual Emotion Representation for Interpretable Speech Emotion Recognition

Hyo Jin Jon, Longbin Jin, Hyuntaek Jung, Hyunseo Kim, Donghun Min, Eun Yi Kim · 2025

This paper presents our contributions to the Speech Emotion Recognition in Naturalistic Conditions (SERNC) Challenge, where we address categorical emotion recognition and emotional attribute predictio…

Read Paper →

Engineering Preprint PDF DOI

I Know You're Listening: Adaptive Voice for HRI

Paige Tuttosi · 2025

While the use of social robots for language teaching has been explored, there remains limited work on a task-specific synthesized voices for language teaching robots. Given that language is a verbal t…

Read Paper →

Engineering Preprint PDF DOI

EmojiVoice: Towards long-term controllable expressivity in robot speech

Paige Tuttosi, Shivam Mehta, Zachary Syvenky, Bermet Burkanova, Gustav Eje Henter, Angelica Lim · 2025

Humans vary their expressivity when speaking for extended periods to maintain engagement with their listener. Although social robots tend to be deployed with ``expressive'' joyful voices, they lack th…

Read Paper →

Engineering Preprint PDF DOI

Seeing Beyond Words: MatVQA for Challenging Visual-Scientific Reasoning in Materials Science

Sifan Wu, Huan Zhang, Yizhan Li, Farshid Effaty, Amirreza Ataei, Bang Liu · 2025

The emergence of Multimodal Large Language Models (MLLMs) that integrate vision and language modalities has unlocked new potentials for scientific reasoning, outperforming prior benchmarks in both nat…

Read Paper →

Engineering Preprint PDF DOI

A Passive Mechanical Add-on for Treadmill Exercise (P-MATE) in Stroke Rehabilitation

Irene L. Y. Beck, Belle C. Hopmans, Bram Haanen, Levi Kieft, Heike Vallery, Laura Marchal-Crespo, Katherine L. Poggensee · 2025

Robotic rehabilitation can deliver high-dose gait therapy and improve motor function after a stroke. However, for many devices, high costs and lengthy setup times limit clinical adoption. Thus, we des…

Read Paper →

Engineering Preprint PDF DOI

In-Materia Speech Recognition

Mohamadreza Zolfagharinejad, Julian Buchel, Lorenzo Cassola, Sachin Kinge, Ghazi Sarwat Syed, Abu Sebastian, Wilfred G. van der Wiel · 2024

With the rise of decentralized computing, as in the Internet of Things, autonomous driving, and personalized healthcare, it is increasingly important to process time-dependent signals at the edge effi…

Read Paper →

Engineering Preprint PDF DOI

Neural Coordination and Capacity Control for Inventory Management

Carson Eisenach, Udaya Ghai, Dhruv Madeka, Kari Torkkola, Dean Foster, Sham Kakade · 2024

This paper addresses the capacitated periodic review inventory control problem, focusing on a retailer managing multiple products with limited shared resources, such as storage or inbound labor at a f…

Read Paper →

Engineering Preprint PDF DOI

The UmboMic: A PVDF Cantilever Microphone

Aaron J. Yeiser, Emma F. Wawrzynek, John Z. Zhang, Lukas Graf, Christopher I. McHugh, Ioannis Kymissis, Elizabeth S. Olson, Jeffrey H. Lang, Hideko Heidi Nakajima · 2023

Objective: We present the "UmboMic," a prototype piezoelectric cantilever microphone designed for future use with totally-implantable cochlear implants. Methods: The UmboMic sensor is made from polyvi…

Read Paper →

Engineering Preprint PDF DOI

Matcha-TTS: A fast TTS architecture with conditional flow matching

Shivam Mehta, Ruibo Tu, Jonas Beskow, Eva Szekely, Gustav Eje Henter · 2023

We introduce Matcha-TTS, a new encoder-decoder architecture for speedy TTS acoustic modelling, trained using optimal-transport conditional flow matching (OT-CFM). This yields an ODE-based decoder capa…

Read Paper →

Engineering Preprint PDF DOI

Chat with the Environment: Interactive Multimodal Perception Using Large Language Models

Xufeng Zhao, Mengdi Li, Cornelius Weber, Muhammad Burhan Hafez, Stefan Wermter · 2023

Programming robot behavior in a complex world faces challenges on multiple levels, from dextrous low-level skills to high-level planning and reasoning. Recent pre-trained Large Language Models (LLMs) …

Read Paper →

Engineering Preprint PDF DOI

On the Uplink SINR Meta Distribution of UAV-assisted Wireless Networks

Yujie Qin, Mustafa A. Kishk, Mohamed-Slim Alouini · 2023

This letter studies the signal-to-interference-plus-noise (SINR) meta distribution of uplink transmission of UAV-enabled wireless networks with inversion power control. Within a framework of stochasti…

Read Paper →

Engineering Preprint PDF DOI

On the Performance of Data Compression in Clustered Fog Radio Access Networks

Haonan Hu, Yan Jiang, Jiliang Zhang, Yanan Zheng, Qianbin Chen, Jie Zhang · 2022

The fog-radio-access-network (F-RAN) has been proposed to address the strict latency requirements, which offloads computation tasks generated in user equipments (UEs) to the edge to reduce the process…

Read Paper →

Engineering Preprint PDF DOI

Faithful Euclidean Distance Field from Log-Gaussian Process Implicit Surfaces

Lan Wu, Ki Myung Brian Lee, Liyang Liu, Teresa Vidal-Calleja · 2020

In this letter, we introduce the Log-Gaussian Process Implicit Surface (Log-GPIS), a novel continuous and probabilistic mapping representation suitable for surface reconstruction and local navigation.…

Read Paper →

Engineering Preprint PDF DOI

Spatiotemporal Modelling of Multi-Gateway LoRa Networks with Imperfect SF Orthogonality

Yathreb Bouazizi, Fatma Benkhelifa, Julie McCann · 2020

Meticulous modelling and performance analysis of Low-Power Wide-Area (LPWA) networks are essential for large scale dense Internet-of-Things (IoT) deployments. As Long Range (LoRa) is currently one of …

Read Paper →

Engineering Preprint PDF DOI

User Association for Offloading in Heterogeneous Network Based on Matern Cluster Process

Yuxuan Xie, Xuefei Zhang, Qimei Cui, Yanyan Lu · 2018

Future mobile networks are converging toward heterogeneous multi-tier networks, where various classes of base stations (BS) are deployed based on user demand. So it is quite necessary to utilize the B…

Read Paper →

Browse Research Papers

MALEFA: Multi-grAnularity Learning and Effective False Alarm Suppression for Zero-shot Keyword Spotting

How Open is Open TTS? A Practical Evaluation of Open Source TTS Tools

Towards Scalable Probabilistic Human Motion Prediction with Gaussian Processes for Safe Human-Robot Collaboration

MATE: Matryoshka Audio-Text Embeddings for Open-Vocabulary Keyword Spotting

Pronunciation Editing for Finnish Speech using Phonetic Posteriorgrams

MATER: Multi-level Acoustic and Textual Emotion Representation for Interpretable Speech Emotion Recognition

I Know You're Listening: Adaptive Voice for HRI

EmojiVoice: Towards long-term controllable expressivity in robot speech

Seeing Beyond Words: MatVQA for Challenging Visual-Scientific Reasoning in Materials Science

A Passive Mechanical Add-on for Treadmill Exercise (P-MATE) in Stroke Rehabilitation

In-Materia Speech Recognition

Neural Coordination and Capacity Control for Inventory Management

The UmboMic: A PVDF Cantilever Microphone

Matcha-TTS: A fast TTS architecture with conditional flow matching

Chat with the Environment: Interactive Multimodal Perception Using Large Language Models

On the Uplink SINR Meta Distribution of UAV-assisted Wireless Networks

On the Performance of Data Compression in Clustered Fog Radio Access Networks

Faithful Euclidean Distance Field from Log-Gaussian Process Implicit Surfaces

Spatiotemporal Modelling of Multi-Gateway LoRa Networks with Imperfect SF Orthogonality

User Association for Offloading in Heterogeneous Network Based on Matern Cluster Process

Browse by Category

Research Type

Publish Your Research