8,738+ open-access research outputs.
Conventionally, Automatic Speech Recognition (ASR) systems are evaluated on their ability to correctly recognize each word contained in a speech signal. In this context, the word error rate (WER) metr…
Evaluating automatic speech recognition (ASR) systems is a classical but difficult and still open problem, which often boils down to focusing only on the word error rate (WER). However, this metric su…
Motivated by an optimal-matching problem (Leighton-Shor) and the random-field Ising model (Aizenman-Wehr, Ding-Wirth), we consider a variational problem for graphs in $1+1$ dimension maximizing an act…
Let $F$ be a finite field of odd characteristic. We prove that any set $A\subset F$ with $|A|\geq C|F|^{5/6}$ contains a nontrivial quadratic progression $(x, x+y, x+y^2), y\neq 0.$ For prime fields, …
Multi-talker automatic speech recognition (ASR) in conversational recordings remains an open problem, particularly in scenarios with large portion of overlapping speech where identifying and transcrib…
Large Language Models (LLMs) are now widely used for query reformulation and expansion in Information Retrieval, with many studies reporting substantial effectiveness gains. However, these results are…
Accented automatic speech recognition (ASR) often degrades due to the limited availability of accented training data. Prior work has explored accent modeling in low-resource settings, but existing app…
Conventional neural speech codecs suffer from severe intelligibility degradation at ultra-low bitrates, where the bottleneck transitions from acoustic distortion to semantic loss. To address this issu…
Preserving a speaker's voice identity while generating speech in a different language remains a fundamental challenge in spoken language technology, particularly in specialized domains such as scienti…
We give a simple description of the coordinate ring of the universal centralizer associated to a simply connected semisimple group. To this end, we prove a general result on Weil restriction of affine…
Real-time automatic speech recognition (ASR) systems face a fundamental trade-off between transcription accuracy and computational efficiency, particularly when deploying large-scale transformer model…
Standard text-to-speech (TTS) evaluation measures intelligibility (WER, CER) and overall naturalness (MOS, UTMOS) but does not quantify accent. A synthesiser may score well on all four yet sound non-n…
Commercial TTS systems produce near-native Indic audio, but the best open-source bases (Chatterbox, Indic Parler-TTS, IndicF5) trail them on measured phonological dimensions, and the most widely adopt…
This is the authors response to commentaries on the original article H is for Human and How (Not) to Evaluate Qualitative Research in HCI, https://doi.org/10.1080/07370024.2025.2475743 Commentaries we…
We give a metaplectic proof of Hilbert reciprocity, and hence of quadratic reciprocity, in which the local phase is the Kashiwara--Maslov phase of a triple of Lagrangians. In rank two the phase of the…
The digitization of multi-domain retail billing documents remains a challenging task due to variability in scan quality, layout heterogeneity, and domain diversity across commercial sectors. This pape…
Unsupervised domain adaptation generalizes neural retrievers to an unseen domain by generating pseudo queries on target domain documents. The quality and efficiency of this adaptation critically depen…
This paper uncovers an exact $\chi^2$ dissipation identity for the Blahut--Arimoto (BA) flow and establishes its fundamental information-geometric structure. While prior works have analyzed BA converg…
Integral representations for continuous polynomial local functionals on convex functions are established in terms of a finite family of polynomials. This result is obtained by approximation from a cla…
Modern retrieval pipelines increasingly serve downstream consumers like retrieval-augmented generation (RAG) and autonomous agents that need more than a scalar relevance score. A reranker that only te…
Free open-access publishing with Google Scholar indexing.
Submission Guide →