Claudio Gutierrez — Research Repository

AI & Data Science Preprint PDF DOI

Are DeepFakes Realistic Enough? Exploring Semantic Mismatch as a Novel Challenge

Sharayu Nilesh Deshmukh, Kailash A. Hambarde, Joana C. Costa, Hugo Proenca, Tiago Roxo · 2026

Current DeepFake detection scenarios are mostly binary, yet data manipulation can vary across audio, video, or both, whose variability is not captured in binary settings. Four-class audio-visual formu…

Read Paper →

AI & Data Science Preprint PDF DOI

Beyond the Baseband: Adaptive Multi-Band Encoding for Full-Spectrum Bioacoustics Classification

Eklavya Sarkar, Marius Miron, David Robinson, Gagan Narula, Milad Alizadeh, Ellen Gilsenan-McMahon, Emmanuel Chemla, Olivier Pietquin, Matthieu Geist · 2026

Animals hear and vocalize across frequency ranges that differ substantially from humans, often extending into the ultrasonic domain. Yet most computational bioacoustics systems rely on audio models pr…

Read Paper →

Engineering Preprint PDF DOI

LRS-VoxMM: A benchmark for in-the-wild audio-visual speech recognition

Doyeop Kwak, Jeongsoo Choi, Suyeon Lee, Joon Son Chung · 2026

We introduce LRS-VoxMM, an in-the-wild benchmark for audio-visual speech recognition (AVSR). The benchmark is derived from VoxMM, a dataset of diverse real-world spoken conversations with human-annota…

Read Paper →

AI & Data Science Preprint PDF DOI

KellyBench: A Benchmark for Long-Horizon Sequential Decision Making

Thomas Grady, Kip Parker, Iliyan Zarov, Henry Course, Chengxi Taylor, Ross Taylor · 2026

Language models are saturating benchmarks for procedural tasks with narrow objectives. But they are increasingly being deployed in long-horizon, non-stationary environments with open-ended goals. In t…

Read Paper →

AI & Data Science Preprint PDF DOI

The TEA Nets framework combines AI and cognitive network science to model targets, events and actors in text

Sebastiano Franchini, Alexis Carrillo, Edoardo Sebastiano De Duro, Riccardo Improta, Ali Aghazadeh Ardebili, Massimo Stella · 2026

We introduce Target-Event-Agent Networks (TEA Nets) as a computational framework to extract subjects (``Agents"), verbs (``Events"), and objects (``Targets") from texts. Grounded in cognitive network …

Read Paper →

AI & Data Science Preprint PDF DOI

AppTek Call-Center Dialogues: A Multi-Accent Long-Form Benchmark for English ASR

Eugen Beck, Sarah Beranek, Uma Moothiringote, Daniel Mann, Wilfried Michel, Katie Nguyen, Taylor Tragemann · 2026

Evaluating English ASR systems for conversational AI applications remains difficult, as many publicly available corpora are either pre-segmented into short segments, consist of read or prepared speech…

Read Paper →

AI & Data Science Preprint PDF DOI

Entropy of Ukrainian

Anton Lavreniuk, Mykyta Mudryi, Markiian Chaklosh · 2026

In natural language processing, the entropy of a language is a measure of its unpredictability and complexity. The first study on this subject was conducted by Claude Shannon in 1951. By having partic…

Read Paper →

Engineering Preprint PDF DOI

BUT System Description for CHiME-9 MCoRec Challenge

Dominik Klement, Alexander Polok, Nguyen Hai Phong, Prachi Singh, Lukas Burget · 2026

Multi-talker automatic speech recognition (ASR) in conversational recordings remains an open problem, particularly in scenarios with large portion of overlapping speech where identifying and transcrib…

Read Paper →

Engineering Preprint PDF DOI

A Knowledge-Driven Approach to Target Speech Extraction in the Presence of Background Sound Effects for Cinematic Audio Source Separation (CASS)

Chun-wei Ho, Sabato Marco Siniscalchi, Kai Li, Chin-Hui Lee · 2026

We propose a knowledge-driven approach to speech target extraction in the presence of background sound effects already recorded in cinematic audio. The specific knowledge sources studied are manners o…

Read Paper →

AI & Data Science Preprint PDF DOI

End-to-End Evaluation and Governance of an EHR-Embedded AI Agent for Clinicians

Aaryan Shah, Andrew Hines, Alexia Downs, Denis Bajet, Paulius Mui, Fabiano Araujo, Laura Offutt, Aida Rutledge, Elizabeth Jimenez · 2026

Clinical AI systems require not just point-in-time evaluation but continuous governance: the ongoing practice of monitoring, evaluating, iterating, and re-evaluating performance throughout deployment.…

Read Paper →

Computer Science Preprint PDF DOI

Predicting Upcoming Stuttering Events from Three-Second Audio: Stratified Evaluation Reveals Severity-Selective Precursors, and the Model Deploys Fully On-Device

Nazar Kozak · 2026

Audio-based stuttering systems to date have been trained for detection -- what disfluency is present now -- leaving prediction, the capability needed for closed-loop intervention, unstudied at deploya…

Read Paper →

AI & Data Science Preprint PDF DOI

The Inverse-Wisdom Law: Architectural Tribalism and the Consensus Paradox in Agentic Swarms

Dahlia Shehata, Ming Li · 2026

As AI transitions toward multi-agent systems (MAS) to solve complex workflows, research paradigms operate on the axiomatic assumption that agent collaboration mirrors the "Wisdom of the Crowd". We cha…

Read Paper →

AI & Data Science Preprint PDF DOI

When Roles Fail: Epistemic Constraints on Advocate Role Fidelity in LLM-Based Political Statement Analysis

Juergen Dietrich · 2026

Democratic discourse analysis systems increasingly rely on multi-agent LLM pipelines in which distinct evaluator models are assigned adversarial roles to generate structured, multi-perspective assessm…

Read Paper →

AI & Data Science Preprint PDF DOI

Cross-Lingual Response Consistency in Large Language Models: An ILR-Informed Evaluation of Claude Across Six Languages

Camelia Baluta · 2026

This paper introduces a systematic evaluation framework grounded in the Interagency Language Roundtable (ILR) Skill Level Descriptions and applies it to Claude (Sonnet 4.6) across six languages: Engli…

Read Paper →

AI & Data Science Preprint PDF DOI

Multiple Additive Neural Networks for Structured and Unstructured Data

Janis Mohr, Jorg Frochte · 2026

This paper extends and explains the Multiple Additive Neural Networks (MANN) methodology, an enhancement to the traditional Gradient Boosting framework, utilizing nearly shallow neural networks instea…

Read Paper →

Computer Science Preprint PDF DOI

Transferability of Token Usage Rights: A Design Space Analysis of Generative AI Services

Jaeyong Lee, Heeju Kang, Ahra Cho, Baek Eunkyung · 2026

With the rapid spread of generative AI services, the token has gained value not only as a technical unit of language processing but also as an economic currency for accessing AI services. Major AI mod…

Read Paper →

Computer Science Preprint PDF DOI

A Toolkit for Detecting Spurious Correlations in Speech Datasets

Lara Gauder, Pablo Riera, Andrea Slachevsky, Gonzalo Forno, Adolfo M. Garcia, Luciana Ferrer · 2026

We introduce a toolkit for uncovering spurious correlations between recording characteristics and target class in speech datasets. Spurious correlations may arise due to heterogeneous recording condit…

Read Paper →

Computer Science Preprint PDF DOI

Diffusion Reconstruction towards Generalizable Audio Deepfake Detection

Bo Cheng, Songjun Cao, Xiaoming Zhang, Jie Chen, Long Ma, Fei Chen · 2026

Achieving robust generalization against unseen attacks remains a challenge in Audio Deepfake Detection (ADD), driven by the rapid evolution of generative models. To address this, we propose a framewor…

Read Paper →

AI & Data Science Preprint PDF DOI

Attribution-Guided Multimodal Deepfake Detection via Cross-Modal Forensic Fingerprints

Wasim Ahmad, Wei Zhang, Xuerui Mao · 2026

Audio-visual deepfakes have reached a level of realism that makes perceptual detection unreliable, threatening media integrity and biometric security. While multimodal detection has shown promise, mos…

Read Paper →

AI & Data Science Preprint PDF DOI

DSIPA: Detecting LLM-Generated Texts via Sentiment-Invariant Patterns Divergence Analysis

Siyuan Li, Aodu Wulianghai, Guangyan Li, Xi Lin, Qinghua Mao, Yuliang Chen, Jun Wu, Jianhua Li · 2026

The rapid advancement of large language models (LLMs) presents new security challenges, particularly in detecting machine-generated text used for misinformation, impersonation, and content forgery. Mo…

Read Paper →

Browse Research Papers

Are DeepFakes Realistic Enough? Exploring Semantic Mismatch as a Novel Challenge

Beyond the Baseband: Adaptive Multi-Band Encoding for Full-Spectrum Bioacoustics Classification

LRS-VoxMM: A benchmark for in-the-wild audio-visual speech recognition

KellyBench: A Benchmark for Long-Horizon Sequential Decision Making

The TEA Nets framework combines AI and cognitive network science to model targets, events and actors in text

AppTek Call-Center Dialogues: A Multi-Accent Long-Form Benchmark for English ASR

Entropy of Ukrainian

BUT System Description for CHiME-9 MCoRec Challenge

A Knowledge-Driven Approach to Target Speech Extraction in the Presence of Background Sound Effects for Cinematic Audio Source Separation (CASS)

End-to-End Evaluation and Governance of an EHR-Embedded AI Agent for Clinicians

Predicting Upcoming Stuttering Events from Three-Second Audio: Stratified Evaluation Reveals Severity-Selective Precursors, and the Model Deploys Fully On-Device

The Inverse-Wisdom Law: Architectural Tribalism and the Consensus Paradox in Agentic Swarms

When Roles Fail: Epistemic Constraints on Advocate Role Fidelity in LLM-Based Political Statement Analysis

Cross-Lingual Response Consistency in Large Language Models: An ILR-Informed Evaluation of Claude Across Six Languages

Multiple Additive Neural Networks for Structured and Unstructured Data

Transferability of Token Usage Rights: A Design Space Analysis of Generative AI Services

A Toolkit for Detecting Spurious Correlations in Speech Datasets

Diffusion Reconstruction towards Generalizable Audio Deepfake Detection

Attribution-Guided Multimodal Deepfake Detection via Cross-Modal Forensic Fingerprints

DSIPA: Detecting LLM-Generated Texts via Sentiment-Invariant Patterns Divergence Analysis

Browse by Category

Research Type

Publish Your Research