Ryoshi Hotta in Engineering — Research Repository

Engineering Preprint PDF DOI

Cross-Linguistic Rhythmic and Spectral Feature-Based Analysis of Nyishi and Adi: Two Under-Resourced Languages of Arunachal Pradesh

Deepshikha Gogoi, Parismita Gogoi, Yang Saring · 2026

Under-resourced languages remain underrepresented in quantitative rhythm research,particularly in systematic intra-branch analysis of acoustic differentiation within closely related linguistic groups.…

Read Paper →

Engineering Preprint PDF DOI

RoSHI: A Versatile Robot-oriented Suit for Human Data In-the-Wild

Wenjing Margaret Mao, Jefferson Ng, Luyang Hu, Daniel Gehrig, Antonio Loquercio · 2026

Scaling up robot learning will likely require human data containing rich and long-horizon interactions in the wild. Existing approaches for collecting such data trade off portability, robustness to oc…

Read Paper →

Engineering Preprint PDF DOI

RHOSI: Efficient Anti-Jamming Resource Allocation with Holographic Surfaces in UAV-enabled ISAC

Jalal Jalali, Mostafa Darabi, Rodrigo C. de Lamare · 2026

This paper investigates the susceptibility of Integrated Sensing and Communication (ISAC) systems to hostile jamming, focusing on an aerial Reconfigurable Holographic Surface (RHS)-aided unmanned aeri…

Read Paper →

Engineering Preprint PDF DOI

Privacy-Preserving End-to-End Full-Duplex Speech Dialogue Models

Nikita Kuzmin, Tao Zhong, Jiajun Deng, Yingke Zhu, Tristan Tsoi, Tianxiang Cao, Simon Lui, Kong Aik Lee, Eng Siong Chng · 2026

End-to-end full-duplex speech models feed user audio through an always-on LLM backbone, yet the speaker privacy implications of their hidden representations remain unexamined. Following the VoicePriva…

Read Paper →

Engineering Preprint PDF DOI

Learning What To Hear: Boosting Sound-Source Association For Robust Audiovisual Instance Segmentation

Jinbae Seo, Hyeongjun Kwon, Kwonyoung Kim, Jiyoung Lee, Kwanghoon Sohn · 2025

Audiovisual instance segmentation (AVIS) requires accurately localizing and tracking sounding objects throughout video sequences. Existing methods suffer from visual bias stemming from two fundamental…

Read Paper →

Engineering Preprint PDF DOI

Benchmarking Massively Parallelized Multi-Task Reinforcement Learning for Robotics Tasks

Viraj Joshi, Zifan Xu, Bo Liu, Peter Stone, Amy Zhang · 2025

Multi-task Reinforcement Learning (MTRL) has emerged as a critical training paradigm for applying reinforcement learning (RL) to a set of complex real-world robotic tasks, which demands a generalizabl…

Read Paper →

Engineering Preprint PDF DOI

FD-Bench: A Full-Duplex Benchmarking Pipeline Designed for Full Duplex Spoken Dialogue Systems

Yizhou Peng, Yi-Wen Chao, Dianwen Ng, Yukun Ma, Chongjia Ni, Bin Ma, Eng Siong Chng · 2025

Full-duplex spoken dialogue systems (FDSDS) enable more natural human-machine interactions by allowing real-time user interruptions and backchanneling, compared to traditional SDS that rely on turn-ta…

Read Paper →

Engineering Preprint PDF DOI

New Test-Time Scenario for Biosignal: Concept and Its Approach

Yong-Yeon Jo, Byeong Tak Lee, Beom Joon Kim, Jeong-Ho Hong, Hak Seung Lee, Joon-myoung Kwon · 2024

Online Test-Time Adaptation (OTTA) enhances model robustness by updating pre-trained models with unlabeled data during testing. In healthcare, OTTA is vital for real-time tasks like predicting blood p…

Read Paper →

Engineering Preprint PDF DOI

Moshi: a speech-text foundation model for real-time dialogue

Alexandre Defossez, Laurent Mazare, Manu Orsini, Amelie Royer, Patrick Perez, Herve Jegou, Edouard Grave, Neil Zeghidour · 2024

We introduce Moshi, a speech-text foundation model and full-duplex spoken dialogue framework. Current systems for spoken dialogue rely on pipelines of independent components, namely voice activity det…

Read Paper →

Engineering Preprint PDF DOI

Gotta catch 'em all, safely! Aerial-deployed soft underwater gripper

Luca Romanello, Daniel Joseph Amir, Heinrich Stengel, Mirko Kovac, Sophie F. Armanini · 2024

Underwater soft grippers exhibit potential for applications such as monitoring, research, and object retrieval. However, existing underwater gripping techniques frequently cause disturbances to ecosys…

Read Paper →

Engineering Preprint PDF DOI

Advancing Frame-Dropping in Multi-Object Tracking-by-Detection Systems Through Event-Based Detection Triggering

Matti Henning, Michael Buchholz, Klaus Dietmayer · 2023

With rising computational requirements modern automated vehicles (AVs) often consider trade-offs between energy consumption and perception performance, potentially jeopardizing their safe operation. F…

Read Paper →

Engineering Preprint PDF DOI

MinkSORT: A 3D deep feature extractor using sparse convolutions to improve 3D multi-object tracking in greenhouse tomato plants

David Rapado-Rincon, Eldert J. van Henten, Gert Kootstra · 2023

The agro-food industry is turning to robots to address the challenge of labour shortage. However, agro-food environments pose difficulties for robots due to high variation and occlusions. In the prese…

Read Paper →

Engineering Preprint PDF DOI

The Impact of Frame-Dropping on Performance and Energy Consumption for Multi-Object Tracking

Matti Henning, Michael Buchholz, Klaus Dietmayer · 2023

The safety of automated vehicles (AVs) relies on the representation of their environment. Consequently, state-of-the-art AVs employ potent sensor systems to achieve the best possible environment repre…

Read Paper →

Engineering Preprint PDF DOI

Voice Conversion Based Speaker Normalization for Acoustic Unit Discovery

Thomas Glarner, Janek Ebbers, Reinhold Hab-Umbach · 2021

Discovering speaker independent acoustic units purely from spoken input is known to be a hard problem. In this work we propose an unsupervised speaker normalization technique prior to unit discovery. …

Read Paper →

Engineering Preprint PDF DOI

Unsupervised Acoustic Unit Discovery by Leveraging a Language-Independent Subword Discriminative Feature Representation

Siyuan Feng, Piotr Zelasko, Laureano Moro-Velazquez, Odette Scharenborg · 2021

This paper tackles automatically discovering phone-like acoustic units (AUD) from unlabeled speech data. Past studies usually proposed single-step approaches. We propose a two-stage approach: the firs…

Read Paper →

Engineering Preprint PDF DOI

A Hierarchical Subspace Model for Language-Attuned Acoustic Unit Discovery

Bolaji Yusuf, Lucas Ondel, Lukas Burget, Jan Cernocky, Murat Saraclar · 2020

In this work, we propose a hierarchical subspace model for acoustic unit discovery. In this approach, we frame the task as one of learning embeddings on a low-dimensional phonetic subspace, and simult…

Read Paper →

Engineering Preprint PDF DOI

Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition

Kohei Matsuura, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara · 2020

It is important to transcribe and archive speech data of endangered languages for preserving heritages of verbal culture and automatic speech recognition (ASR) is a powerful tool to facilitate this pr…

Read Paper →

Engineering Preprint PDF DOI

Yottixel -- An Image Search Engine for Large Archives of Histopathology Whole Slide Images

S. Kalra, C. Choi, S. Shah, L. Pantanowitz, H.R. Tizhoosh · 2019

With the emergence of digital pathology, searching for similar images in large archives has gained considerable attention. Image retrieval can provide pathologists with unprecedented access to the evi…

Read Paper →

Browse Research Papers

Cross-Linguistic Rhythmic and Spectral Feature-Based Analysis of Nyishi and Adi: Two Under-Resourced Languages of Arunachal Pradesh

RoSHI: A Versatile Robot-oriented Suit for Human Data In-the-Wild

RHOSI: Efficient Anti-Jamming Resource Allocation with Holographic Surfaces in UAV-enabled ISAC

Privacy-Preserving End-to-End Full-Duplex Speech Dialogue Models

Learning What To Hear: Boosting Sound-Source Association For Robust Audiovisual Instance Segmentation

Benchmarking Massively Parallelized Multi-Task Reinforcement Learning for Robotics Tasks

FD-Bench: A Full-Duplex Benchmarking Pipeline Designed for Full Duplex Spoken Dialogue Systems

New Test-Time Scenario for Biosignal: Concept and Its Approach

Moshi: a speech-text foundation model for real-time dialogue

Gotta catch 'em all, safely! Aerial-deployed soft underwater gripper

Advancing Frame-Dropping in Multi-Object Tracking-by-Detection Systems Through Event-Based Detection Triggering

MinkSORT: A 3D deep feature extractor using sparse convolutions to improve 3D multi-object tracking in greenhouse tomato plants

The Impact of Frame-Dropping on Performance and Energy Consumption for Multi-Object Tracking

Voice Conversion Based Speaker Normalization for Acoustic Unit Discovery

Unsupervised Acoustic Unit Discovery by Leveraging a Language-Independent Subword Discriminative Feature Representation

A Hierarchical Subspace Model for Language-Attuned Acoustic Unit Discovery

Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition

Yottixel -- An Image Search Engine for Large Archives of Histopathology Whole Slide Images

Browse by Category

Research Type

Publish Your Research