Expertini Research Research

Browse Research Papers

34+ open-access research outputs.

โœ• Clear
๐Ÿ” w.j. munro ๐Ÿ“‚ Engineering
Showing 34 results for "w.j. munro" in Engineering
Engineering Preprint PDF DOI

Teaching the Teachers: Boosting unsupervised domain adaptation in speech recognition by ensemble update

Rehan Ahmad, Muhammad Umar Farooq, Qihang Feng, Thomas Hain ยท 2026

Speech recognition systems often struggle with data domains that have not been included in the training. To address this, unsupervised domain adaptation has been explored with ensemble and multi-stageโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Neural Forward Filtering for Speaker-Image Separation

Jingqi Sun, Shulin He, Ruizhe Pang, Zhong-Qiu Wang ยท 2025

We address monaural multi-speaker-image separation in reverberant conditions, aiming at separating mixed speakers but preserving the reverberation of each speaker. A straightforward approach for this โ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Adversarial Deep Metric Learning for Cross-Modal Audio-Text Alignment in Open-Vocabulary Keyword Spotting

Youngmoon Jung, Yong-Hyeok Lee, Myunghun Jung, Jaeyoung Roh, Chang Woo Han, Hoon-Young Cho ยท 2025

For text enrollment-based open-vocabulary keyword spotting (KWS), acoustic and text embeddings are typically compared at either the phoneme or utterance level. To facilitate this, we optimize acousticโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Modelowanie nieliniowej charakterystyki szerokopasmowych wzmacniaczy radiowych o zmiennym napi\k{e}ciu zasilania; Modeling Nonlinear Characteristics of Wideband Radio Frequency Amplifiers with Variable Supply Voltage

Kornelia Kostrzewska, Pawe{l} Kryszkiewicz ยท 2025

The work aims to propose a new nonlinear characteristics model for a wideband radio amplifier of variable supply voltage. An extended Rapp model proposal is presented. The proposed model has been veriโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Noise-robust Speech Separation with Fast Generative Correction

Helin Wang, Jesus Villalba, Laureano Moro-Velazquez, Jiarui Hai, Thomas Thebaud, Najim Dehak ยท 2024

Speech separation, the task of isolating multiple speech sources from a mixed audio signal, remains challenging in noisy environments. In this paper, we propose a generative correction method to enhanโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Relational Proxy Loss for Audio-Text based Keyword Spotting

Youngmoon Jung, Seungjin Lee, Joon-Young Yang, Jaeyoung Roh, Chang Woo Han, Hoon-Young Cho ยท 2024

In recent years, there has been an increasing focus on user convenience, leading to increased interest in text-based keyword enrollment systems for keyword spotting (KWS). Since the system utilizes teโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition

Yusuke Shinohara, Shinji Watanabe ยท 2022

Sequence transducers, such as the RNN-T and the Conformer-T, are one of the most promising models of end-to-end speech recognition, especially in streaming scenarios where both latency and accuracy arโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

MMS-MSG: A Multi-purpose Multi-Speaker Mixture Signal Generator

Tobias Cord-Landwehr, Thilo von Neumann, Christoph Boeddeker, Reinhold Haeb-Umbach ยท 2022

The scope of speech enhancement has changed from a monolithic view of single, independent tasks, to a joint processing of complex conversational speech recordings. Training and evaluation of these sinโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

DEFORMER: Coupling Deformed Localized Patterns with Global Context for Robust End-to-end Speech Recognition

Jiamin Xie, John H.L. Hansen ยท 2022

Convolutional neural networks (CNN) have improved speech recognition performance greatly by exploiting localized time-frequency patterns. But these patterns are assumed to appear in symmetric and rigiโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

CMOS Circuit Implementation of Spiking Neural Network for Pattern Recognition Using On-chip Unsupervised STDP Learning

Sahibia Kaur Vohra, Sherin A Thomas, Mahendra Sakare, Devarshi Mrinal Das ยท 2022

Computation on a large volume of data at high speed and low power requires energy-efficient computing architectures. Spiking neural network (SNN) with bio-inspired spike-timing-dependent plasticity leโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Asymmetric Proxy Loss for Multi-View Acoustic Word Embeddings

Myunghun Jung, Hoirin Kim ยท 2022

Acoustic word embeddings (AWEs) are discriminative representations of speech segments, and learned embedding space reflects the phonetic similarity between words. With multi-view learning, where text โ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Real-time Speaker counting in a cocktail party scenario using Attention-guided Convolutional Neural Network

Midia Yousefi, John H.L. Hansen ยท 2021

Most current speech technology systems are designed to operate well even in the presence of multiple active speakers. However, most solutions assume that the number of co-current speakers is known. Unโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Graph-PIT: Generalized permutation invariant training for continuous separation of arbitrary numbers of speakers

Thilo von Neumann, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix, Reinhold Haeb-Umbach ยท 2021

Automatic transcription of meetings requires handling of overlapped speech, which calls for continuous speech separation (CSS) systems. The uPIT criterion was proposed for utterance-level separation wโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition

Timo Lohrenz, Patrick Schwarz, Zhengyang Li, Tim Fingscheidt ยท 2021

Recently, attention-based encoder-decoder (AED) models have shown high performance for end-to-end automatic speech recognition (ASR) across several tasks. Addressing overconfidence in such models, in โ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Head-synchronous Decoding for Transformer-based Streaming ASR

Mohan Li, Catalin Zorila, Rama Doddipatla ยท 2021

Online Transformer-based automatic speech recognition (ASR) systems have been extensively studied due to the increasing demand for streaming applications. Recently proposed Decoder-end Adaptive Computโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition

Timo Lohrenz, Zhengyang Li, Tim Fingscheidt ยท 2021

Stream fusion, also known as system combination, is a common technique in automatic speech recognition for traditional hybrid hidden Markov model approaches, yet mostly unexplored for modern deep neurโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Unidirectional Memory-Self-Attention Transducer for Online Speech Recognition

Jian Luo, Jianzong Wang, Ning Cheng, Jing Xiao ยท 2021

Self-attention models have been successfully applied in end-to-end speech recognition systems, which greatly improve the performance of recognition accuracy. However, such attention-based models cannoโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Do End-to-End Speech Recognition Models Care About Context?

Lasse Borgholt, Jakob Drachmann Havtorn, Zeljko Agic, Anders S{o}gaard, Lars Maal{o}e, Christian Igel ยท 2021

The two most common paradigms for end-to-end speech recognition are connectionist temporal classification (CTC) and attention-based encoder-decoder (AED) models. It has been argued that the latter is โ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition

Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu ยท 2021

Multi-source localization is an important and challenging technique for multi-talker conversation analysis. This paper proposes a novel supervised learning method using deep neural networks to estimatโ€ฆ

Read Paper โ†’
Engineering Preprint PDF DOI

Efficient Neural Architecture Search for End-to-end Speech Recognition via Straight-Through Gradients

Huahuan Zheng, Keyu An, Zhijian Ou ยท 2020

Neural Architecture Search (NAS), the process of automating architecture engineering, is an appealing next step to advancing end-to-end Automatic Speech Recognition (ASR), replacing expert-designed neโ€ฆ

Read Paper โ†’
Page 1 of 2 Next โ†’