W.J. Munro in Engineering — Research Repository

Engineering Preprint PDF DOI

Teaching the Teachers: Boosting unsupervised domain adaptation in speech recognition by ensemble update

Rehan Ahmad, Muhammad Umar Farooq, Qihang Feng, Thomas Hain · 2026

Speech recognition systems often struggle with data domains that have not been included in the training. To address this, unsupervised domain adaptation has been explored with ensemble and multi-stage…

Read Paper →

Engineering Preprint PDF DOI

Neural Forward Filtering for Speaker-Image Separation

Jingqi Sun, Shulin He, Ruizhe Pang, Zhong-Qiu Wang · 2025

We address monaural multi-speaker-image separation in reverberant conditions, aiming at separating mixed speakers but preserving the reverberation of each speaker. A straightforward approach for this …

Read Paper →

Engineering Preprint PDF DOI

Adversarial Deep Metric Learning for Cross-Modal Audio-Text Alignment in Open-Vocabulary Keyword Spotting

Youngmoon Jung, Yong-Hyeok Lee, Myunghun Jung, Jaeyoung Roh, Chang Woo Han, Hoon-Young Cho · 2025

For text enrollment-based open-vocabulary keyword spotting (KWS), acoustic and text embeddings are typically compared at either the phoneme or utterance level. To facilitate this, we optimize acoustic…

Read Paper →

Engineering Preprint PDF DOI

Modelowanie nieliniowej charakterystyki szerokopasmowych wzmacniaczy radiowych o zmiennym napi\k{e}ciu zasilania; Modeling Nonlinear Characteristics of Wideband Radio Frequency Amplifiers with Variable Supply Voltage

Kornelia Kostrzewska, Pawe{l} Kryszkiewicz · 2025

The work aims to propose a new nonlinear characteristics model for a wideband radio amplifier of variable supply voltage. An extended Rapp model proposal is presented. The proposed model has been veri…

Read Paper →

Engineering Preprint PDF DOI

Noise-robust Speech Separation with Fast Generative Correction

Helin Wang, Jesus Villalba, Laureano Moro-Velazquez, Jiarui Hai, Thomas Thebaud, Najim Dehak · 2024

Speech separation, the task of isolating multiple speech sources from a mixed audio signal, remains challenging in noisy environments. In this paper, we propose a generative correction method to enhan…

Read Paper →

Engineering Preprint PDF DOI

Relational Proxy Loss for Audio-Text based Keyword Spotting

Youngmoon Jung, Seungjin Lee, Joon-Young Yang, Jaeyoung Roh, Chang Woo Han, Hoon-Young Cho · 2024

In recent years, there has been an increasing focus on user convenience, leading to increased interest in text-based keyword enrollment systems for keyword spotting (KWS). Since the system utilizes te…

Read Paper →

Engineering Preprint PDF DOI

Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition

Yusuke Shinohara, Shinji Watanabe · 2022

Sequence transducers, such as the RNN-T and the Conformer-T, are one of the most promising models of end-to-end speech recognition, especially in streaming scenarios where both latency and accuracy ar…

Read Paper →

Engineering Preprint PDF DOI

MMS-MSG: A Multi-purpose Multi-Speaker Mixture Signal Generator

Tobias Cord-Landwehr, Thilo von Neumann, Christoph Boeddeker, Reinhold Haeb-Umbach · 2022

The scope of speech enhancement has changed from a monolithic view of single, independent tasks, to a joint processing of complex conversational speech recordings. Training and evaluation of these sin…

Read Paper →

Engineering Preprint PDF DOI

DEFORMER: Coupling Deformed Localized Patterns with Global Context for Robust End-to-end Speech Recognition

Jiamin Xie, John H.L. Hansen · 2022

Convolutional neural networks (CNN) have improved speech recognition performance greatly by exploiting localized time-frequency patterns. But these patterns are assumed to appear in symmetric and rigi…

Read Paper →

Engineering Preprint PDF DOI

CMOS Circuit Implementation of Spiking Neural Network for Pattern Recognition Using On-chip Unsupervised STDP Learning

Sahibia Kaur Vohra, Sherin A Thomas, Mahendra Sakare, Devarshi Mrinal Das · 2022

Computation on a large volume of data at high speed and low power requires energy-efficient computing architectures. Spiking neural network (SNN) with bio-inspired spike-timing-dependent plasticity le…

Read Paper →

Engineering Preprint PDF DOI

Asymmetric Proxy Loss for Multi-View Acoustic Word Embeddings

Myunghun Jung, Hoirin Kim · 2022

Acoustic word embeddings (AWEs) are discriminative representations of speech segments, and learned embedding space reflects the phonetic similarity between words. With multi-view learning, where text …

Read Paper →

Engineering Preprint PDF DOI

Real-time Speaker counting in a cocktail party scenario using Attention-guided Convolutional Neural Network

Midia Yousefi, John H.L. Hansen · 2021

Most current speech technology systems are designed to operate well even in the presence of multiple active speakers. However, most solutions assume that the number of co-current speakers is known. Un…

Read Paper →

Engineering Preprint PDF DOI

Graph-PIT: Generalized permutation invariant training for continuous separation of arbitrary numbers of speakers

Thilo von Neumann, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix, Reinhold Haeb-Umbach · 2021

Automatic transcription of meetings requires handling of overlapped speech, which calls for continuous speech separation (CSS) systems. The uPIT criterion was proposed for utterance-level separation w…

Read Paper →

Engineering Preprint PDF DOI

Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition

Timo Lohrenz, Patrick Schwarz, Zhengyang Li, Tim Fingscheidt · 2021

Recently, attention-based encoder-decoder (AED) models have shown high performance for end-to-end automatic speech recognition (ASR) across several tasks. Addressing overconfidence in such models, in …

Read Paper →

Engineering Preprint PDF DOI

Head-synchronous Decoding for Transformer-based Streaming ASR

Mohan Li, Catalin Zorila, Rama Doddipatla · 2021

Online Transformer-based automatic speech recognition (ASR) systems have been extensively studied due to the increasing demand for streaming applications. Recently proposed Decoder-end Adaptive Comput…

Read Paper →

Engineering Preprint PDF DOI

Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition

Timo Lohrenz, Zhengyang Li, Tim Fingscheidt · 2021

Stream fusion, also known as system combination, is a common technique in automatic speech recognition for traditional hybrid hidden Markov model approaches, yet mostly unexplored for modern deep neur…

Read Paper →

Engineering Preprint PDF DOI

Unidirectional Memory-Self-Attention Transducer for Online Speech Recognition

Jian Luo, Jianzong Wang, Ning Cheng, Jing Xiao · 2021

Self-attention models have been successfully applied in end-to-end speech recognition systems, which greatly improve the performance of recognition accuracy. However, such attention-based models canno…

Read Paper →

Engineering Preprint PDF DOI

Do End-to-End Speech Recognition Models Care About Context?

Lasse Borgholt, Jakob Drachmann Havtorn, Zeljko Agic, Anders S{o}gaard, Lars Maal{o}e, Christian Igel · 2021

The two most common paradigms for end-to-end speech recognition are connectionist temporal classification (CTC) and attention-based encoder-decoder (AED) models. It has been argued that the latter is …

Read Paper →

Engineering Preprint PDF DOI

Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition

Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu · 2021

Multi-source localization is an important and challenging technique for multi-talker conversation analysis. This paper proposes a novel supervised learning method using deep neural networks to estimat…

Read Paper →

Engineering Preprint PDF DOI

Efficient Neural Architecture Search for End-to-end Speech Recognition via Straight-Through Gradients

Huahuan Zheng, Keyu An, Zhijian Ou · 2020

Neural Architecture Search (NAS), the process of automating architecture engineering, is an appealing next step to advancing end-to-end Automatic Speech Recognition (ASR), replacing expert-designed ne…

Read Paper →

Browse Research Papers

Teaching the Teachers: Boosting unsupervised domain adaptation in speech recognition by ensemble update

Neural Forward Filtering for Speaker-Image Separation

Adversarial Deep Metric Learning for Cross-Modal Audio-Text Alignment in Open-Vocabulary Keyword Spotting

Modelowanie nieliniowej charakterystyki szerokopasmowych wzmacniaczy radiowych o zmiennym napi\k{e}ciu zasilania; Modeling Nonlinear Characteristics of Wideband Radio Frequency Amplifiers with Variable Supply Voltage

Noise-robust Speech Separation with Fast Generative Correction

Relational Proxy Loss for Audio-Text based Keyword Spotting

Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition

MMS-MSG: A Multi-purpose Multi-Speaker Mixture Signal Generator

DEFORMER: Coupling Deformed Localized Patterns with Global Context for Robust End-to-end Speech Recognition

CMOS Circuit Implementation of Spiking Neural Network for Pattern Recognition Using On-chip Unsupervised STDP Learning

Asymmetric Proxy Loss for Multi-View Acoustic Word Embeddings

Real-time Speaker counting in a cocktail party scenario using Attention-guided Convolutional Neural Network

Graph-PIT: Generalized permutation invariant training for continuous separation of arbitrary numbers of speakers

Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition

Head-synchronous Decoding for Transformer-based Streaming ASR

Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition

Unidirectional Memory-Self-Attention Transducer for Online Speech Recognition

Do End-to-End Speech Recognition Models Care About Context?

Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition

Efficient Neural Architecture Search for End-to-end Speech Recognition via Straight-Through Gradients

Browse by Category

Research Type

Publish Your Research