18+ open-access research outputs.
Under-resourced languages remain underrepresented in quantitative rhythm research,particularly in systematic intra-branch analysis of acoustic differentiation within closely related linguistic groups.โฆ
Scaling up robot learning will likely require human data containing rich and long-horizon interactions in the wild. Existing approaches for collecting such data trade off portability, robustness to ocโฆ
This paper investigates the susceptibility of Integrated Sensing and Communication (ISAC) systems to hostile jamming, focusing on an aerial Reconfigurable Holographic Surface (RHS)-aided unmanned aeriโฆ
End-to-end full-duplex speech models feed user audio through an always-on LLM backbone, yet the speaker privacy implications of their hidden representations remain unexamined. Following the VoicePrivaโฆ
Audiovisual instance segmentation (AVIS) requires accurately localizing and tracking sounding objects throughout video sequences. Existing methods suffer from visual bias stemming from two fundamentalโฆ
Multi-task Reinforcement Learning (MTRL) has emerged as a critical training paradigm for applying reinforcement learning (RL) to a set of complex real-world robotic tasks, which demands a generalizablโฆ
Full-duplex spoken dialogue systems (FDSDS) enable more natural human-machine interactions by allowing real-time user interruptions and backchanneling, compared to traditional SDS that rely on turn-taโฆ
Online Test-Time Adaptation (OTTA) enhances model robustness by updating pre-trained models with unlabeled data during testing. In healthcare, OTTA is vital for real-time tasks like predicting blood pโฆ
We introduce Moshi, a speech-text foundation model and full-duplex spoken dialogue framework. Current systems for spoken dialogue rely on pipelines of independent components, namely voice activity detโฆ
Underwater soft grippers exhibit potential for applications such as monitoring, research, and object retrieval. However, existing underwater gripping techniques frequently cause disturbances to ecosysโฆ
With rising computational requirements modern automated vehicles (AVs) often consider trade-offs between energy consumption and perception performance, potentially jeopardizing their safe operation. Fโฆ
The agro-food industry is turning to robots to address the challenge of labour shortage. However, agro-food environments pose difficulties for robots due to high variation and occlusions. In the preseโฆ
The safety of automated vehicles (AVs) relies on the representation of their environment. Consequently, state-of-the-art AVs employ potent sensor systems to achieve the best possible environment repreโฆ
Discovering speaker independent acoustic units purely from spoken input is known to be a hard problem. In this work we propose an unsupervised speaker normalization technique prior to unit discovery. โฆ
This paper tackles automatically discovering phone-like acoustic units (AUD) from unlabeled speech data. Past studies usually proposed single-step approaches. We propose a two-stage approach: the firsโฆ
In this work, we propose a hierarchical subspace model for acoustic unit discovery. In this approach, we frame the task as one of learning embeddings on a low-dimensional phonetic subspace, and simultโฆ
It is important to transcribe and archive speech data of endangered languages for preserving heritages of verbal culture and automatic speech recognition (ASR) is a powerful tool to facilitate this prโฆ
With the emergence of digital pathology, searching for similar images in large archives has gained considerable attention. Image retrieval can provide pathologists with unprecedented access to the eviโฆ
Free open-access publishing with Google Scholar indexing.
Submission Guide โ