Osbert Bastani in Engineering — Research Repository

Engineering Preprint PDF DOI

SPG-Codec: Exploring the Role and Boundaries of Semantic Priors in Ultra-Low-Bitrate Neural Speech Coding

Mingyu Zhao, Zijian Lin, Kun Wei, Zhiyong Wu · 2026

Conventional neural speech codecs suffer from severe intelligibility degradation at ultra-low bitrates, where the bottleneck transitions from acoustic distortion to semantic loss. To address this issu…

Read Paper →

Engineering Preprint PDF DOI

Pedestrians play chicken with an autonomous vehicle

Rakshit Soni, Charles Fox · 2026

Automated vehicles (AVs) are commonly programmed to yield unconditionally to pedestrians in the interest of safety. However, this design choice can give rise to the Freezing Robot Problem in which ped…

Read Paper →

Engineering Preprint PDF DOI

VisG AV-HuBERT: Viseme-Guided AV-HuBERT

Aristeidis Papadopoulos, Rishabh Jain, Naomi Harte · 2026

Audio-Visual Speech Recognition (AVSR) systems nowadays integrate Large Language Model (LLM) decoders with transformer-based encoders, achieving state-of-the-art results. However, the relative contrib…

Read Paper →

Engineering Preprint PDF DOI

HARNESS: Lightweight Distilled Arabic Speech Foundation Models

Vrunda N. Sukhadia, Shammur Absar Chowdhury · 2026

Large self-supervised speech (SSL) models achieve strong downstream performance, but their size limits deployment in resource-constrained settings. We present HArnESS, an Arabic-centric self-supervise…

Read Paper →

Engineering Preprint PDF DOI

Bootstrapping Audiovisual Speech Recognition in Zero-AV-Resource Scenarios with Synthetic Visual Data

Pol Buitrago, Pol Galvez, Oriol Pareras, Javier Hernando · 2026

Audiovisual speech recognition (AVSR) combines acoustic and visual cues to improve transcription robustness under challenging conditions but remains out of reach for most under-resourced languages due…

Read Paper →

Engineering Preprint PDF DOI

Privacy-Preserving End-to-End Full-Duplex Speech Dialogue Models

Nikita Kuzmin, Tao Zhong, Jiajun Deng, Yingke Zhu, Tristan Tsoi, Tianxiang Cao, Simon Lui, Kong Aik Lee, Eng Siong Chng · 2026

End-to-end full-duplex speech models feed user audio through an always-on LLM backbone, yet the speaker privacy implications of their hidden representations remain unexamined. Following the VoicePriva…

Read Paper →

Engineering Preprint PDF DOI

Right in Time: Reactive Reasoning in Regulated Traffic Spaces

Simon Kohaut, Benedict Flade, Julian Eggert, Kristian Kersting, Devendra Singh Dhami · 2026

Exact inference in probabilistic First-Order Logic offers a promising yet computationally costly approach for regulating the behavior of autonomous agents in shared traffic spaces. While prior methods…

Read Paper →

Engineering Preprint PDF DOI

Extending 2D foundational DINOv3 representations to 3D segmentation of neonatal brain MR images

Annayah Usman, Behraj Khan, Tahir Qasim Syed · 2026

Precise volumetric delineation of hippocampal structures is essential for quantifying neurodevelopmental trajectories in pre-term and term infants, where subtle morphological variations may carry prog…

Read Paper →

Engineering Preprint PDF DOI

An Explainable Agentic AI Framework for Uncertainty-Aware and Abstention-Enabled Acute Ischemic Stroke Imaging Decisions

Md Rashadul Islam · 2026

Artificial intelligence models have shown strong potential in acute ischemic stroke imaging, particularly for lesion detection and segmentation using computed tomography and magnetic resonance imaging…

Read Paper →

Engineering Preprint PDF DOI

Socially aware navigation for mobile robots: a survey on deep reinforcement learning approaches

Ibrahim Khalil Kabir, Muhammad Faizan Mysorewala · 2025

Socially aware navigation is a fast-evolving research area in robotics that enables robots to move within human environments while adhering to the implicit human social norms. The advent of Deep Reinf…

Read Paper →

Engineering Preprint PDF DOI

GNN-Enabled Robust Hybrid Beamforming with Score-Based CSI Generation and Denoising

Yuhang Li, Yang Lu, Bo Ai, Zhiguo Ding, Dusit Niyato, Arumugam Nallanathan · 2025

Accurate Channel State Information (CSI) is critical for Hybrid Beamforming (HBF) tasks. However, obtaining high-resolution CSI remains challenging in practical wireless communication systems. To addr…

Read Paper →

Engineering Preprint PDF DOI

Interpreting the Role of Visemes in Audio-Visual Speech Recognition

Aristeidis Papadopoulos, Naomi Harte · 2025

Audio-Visual Speech Recognition (AVSR) models have surpassed their audio-only counterparts in terms of performance. However, the interpretability of AVSR systems, particularly the role of the visual m…

Read Paper →

Engineering Preprint PDF DOI

BabyHuBERT: Multilingual Self-Supervised Learning for Segmenting Speakers in Child-Centered Long-Form Recordings

Theo Charlot, Tarek Kunze, Maxime Poli, Alejandrina Cristia, Emmanuel Dupoux, Marvin Lavechin · 2025

Child-centered daylong recordings are essential for studying early language development, but existing speech models trained on clean adult data perform poorly due to acoustic and linguistic difference…

Read Paper →

Engineering Preprint PDF DOI

Multi-pathology Chest X-ray Classification with Rejection Mechanisms

Yehudit Aperstein, Amit Tzahar, Alon Gottlib, Tal Verber, Ravit Shagan Damti, Alexander Apartsin · 2025

Overconfidence in deep learning models poses a significant risk in high-stakes medical imaging tasks, particularly in multi-label classification of chest X-rays, where multiple co-occurring pathologie…

Read Paper →

Engineering Preprint PDF DOI

6G Resilience -- White Paper

Hirley Alves, Nurul H. Mahmood, Onel L. A. Lopez, Sumudu Samarakoon, Seppo Yrjola, Matti Latva-Aho, Markku Juntti, Ari Pouttu, Armin Dekorsy, Arthur Sousa de Sena, Aydin Sezgin, Bho Matthiesen, Chafika Benzaid, Chathuranga Weeraddana, David Hutchison, Dileepa Marasinghe, Doganalp Ergenc, Eduard Jorswieck, Erkki Harjula, Falko Dressler, Harri Saarnisaari, Italo Atzeni, Jaap Van De Beek, Jacek Rak, Konstantin Mikhaylov, Lauri Loven, Madhusanka Liyanage, Marcos Katz, Marja Matinmikko-Blue, Mehdi Rasti, Mika Ylianttila, Nhan Nguyen, Pawani Porambage, Petar Popovski, Petri Ahokangas, Premanandana Rajatheva, Robert-Jeron Reifert, Tharaka Hewa, Tommy Svensson · 2025

6G must be designed to withstand, adapt to, and evolve amid prolonged, complex disruptions. Mobile networks' shift from efficiency-first to sustainability-aware has motivated this white paper to asser…

Read Paper →

Engineering Preprint PDF DOI

Zero-Shot KWS for Children's Speech using Layer-Wise Features from SSL Models

Subham Kutum, Abhijit Sinha, Hemant Kumar Kathania, Sudarsana Reddy Kadiri, Mahesh Chandra Govil · 2025

Numerous methods have been proposed to enhance Keyword Spotting (KWS) in adult speech, but children's speech presents unique challenges for KWS systems due to its distinct acoustic and linguistic char…

Read Paper →

Engineering Preprint PDF DOI

AT-CXR: Uncertainty-Aware Agentic Triage for Chest X-rays

Xueyang Li, Mingze Jiang, Gelei Xu, Jun Xia, Mengzhao Jia, Danny Chen, Yiyu Shi · 2025

Agentic AI is advancing rapidly, yet truly autonomous medical-imaging triage, where a system decides when to stop, escalate, or defer under real constraints, remains relatively underexplored. To addre…

Read Paper →

Engineering Preprint PDF DOI

Towards interpretable emotion recognition: Identifying key features with machine learning

Yacouba Kaloga, Ina Kodrasi · 2025

Unsupervised methods, such as wav2vec2 and HuBERT, have achieved state-of-the-art performance in audio tasks, leading to a shift away from research on interpretable features. However, the lack of inte…

Read Paper →

Engineering Preprint PDF DOI

Large Language Model Guided Decoding for Self-Supervised Speech Recognition

Eyal Cohen, Bhiksha Raj, Joseph Keshet · 2025

Self-supervised automatic speech recognition (SSL-ASR) is an ASR approach that uses speech encoders pretrained on large amounts of unlabeled audio (e.g., wav2vec2.0 or HuBERT) and then fine-tunes them…

Read Paper →

Engineering Preprint PDF DOI

Musical Source Separation Bake-Off: Comparing Objective Metrics with Human Perception

Noah Jaffe, John Ashley Burgoyne · 2025

Music source separation aims to extract individual sound sources (e.g., vocals, drums, guitar) from a mixed music recording. However, evaluating the quality of separated audio remains challenging, as …

Read Paper →

Browse Research Papers

SPG-Codec: Exploring the Role and Boundaries of Semantic Priors in Ultra-Low-Bitrate Neural Speech Coding

Pedestrians play chicken with an autonomous vehicle

VisG AV-HuBERT: Viseme-Guided AV-HuBERT

HARNESS: Lightweight Distilled Arabic Speech Foundation Models

Bootstrapping Audiovisual Speech Recognition in Zero-AV-Resource Scenarios with Synthetic Visual Data

Privacy-Preserving End-to-End Full-Duplex Speech Dialogue Models

Right in Time: Reactive Reasoning in Regulated Traffic Spaces

Extending 2D foundational DINOv3 representations to 3D segmentation of neonatal brain MR images

An Explainable Agentic AI Framework for Uncertainty-Aware and Abstention-Enabled Acute Ischemic Stroke Imaging Decisions

Socially aware navigation for mobile robots: a survey on deep reinforcement learning approaches

GNN-Enabled Robust Hybrid Beamforming with Score-Based CSI Generation and Denoising

Interpreting the Role of Visemes in Audio-Visual Speech Recognition

BabyHuBERT: Multilingual Self-Supervised Learning for Segmenting Speakers in Child-Centered Long-Form Recordings

Multi-pathology Chest X-ray Classification with Rejection Mechanisms

6G Resilience -- White Paper

Zero-Shot KWS for Children's Speech using Layer-Wise Features from SSL Models

AT-CXR: Uncertainty-Aware Agentic Triage for Chest X-rays

Towards interpretable emotion recognition: Identifying key features with machine learning

Large Language Model Guided Decoding for Self-Supervised Speech Recognition

Musical Source Separation Bake-Off: Comparing Objective Metrics with Human Perception

Browse by Category

Research Type

Publish Your Research