Abudit Rai — Research Repository

Computer Science Preprint PDF DOI

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

Chenxin Li, Zhengyang Tang, Huangxin Lin, Yunlong Lin, Shijue Huang, Shengyuan Liu, Bowen Ye, Rang Li, Lei Li, Benyou Wang, Yixuan Yuan · 2026

LLM agents are expected to complete end-to-end units of work across software tools, business services, and local workspaces. Yet many agent benchmarks freeze a curated task set at release time and gra…

Read Paper →

Mathematics Preprint PDF DOI

Tur\'an-Type Extremal Results for Distance-$k$ Graphs

Zhen He, Nika Salia, Casey Tompkins, Xiutao Zhu · 2026

We study Tur\'an-type extremal problems for distance graphs, motivated by work of Csikv\'ari, Bollob\'as, Tyomkyn, and Uzzell. We determine the maximum number of vertex pairs at distance three in an $…

Read Paper →

AI & Data Science Preprint PDF DOI

Are DeepFakes Realistic Enough? Exploring Semantic Mismatch as a Novel Challenge

Sharayu Nilesh Deshmukh, Kailash A. Hambarde, Joana C. Costa, Hugo Proenca, Tiago Roxo · 2026

Current DeepFake detection scenarios are mostly binary, yet data manipulation can vary across audio, video, or both, whose variability is not captured in binary settings. Four-class audio-visual formu…

Read Paper →

AI & Data Science Preprint PDF DOI

Beyond the Baseband: Adaptive Multi-Band Encoding for Full-Spectrum Bioacoustics Classification

Eklavya Sarkar, Marius Miron, David Robinson, Gagan Narula, Milad Alizadeh, Ellen Gilsenan-McMahon, Emmanuel Chemla, Olivier Pietquin, Matthieu Geist · 2026

Animals hear and vocalize across frequency ranges that differ substantially from humans, often extending into the ultrasonic domain. Yet most computational bioacoustics systems rely on audio models pr…

Read Paper →

Neuroscience Preprint PDF DOI

On Agentic Behavioral Modeling

Dirk Ostwald, Rasmus Bruckner, Franziska Usee, Belinda Fleischmann, Joram Soch, Sean Mulready · 2026

Integrating theoretical neuroscience, decision theory, and probabilistic inference offers a promising route to understanding human cognition, yet concrete methodological bridges between agentic AI mod…

Read Paper →

Engineering Preprint PDF DOI

LRS-VoxMM: A benchmark for in-the-wild audio-visual speech recognition

Doyeop Kwak, Jeongsoo Choi, Suyeon Lee, Joon Son Chung · 2026

We introduce LRS-VoxMM, an in-the-wild benchmark for audio-visual speech recognition (AVSR). The benchmark is derived from VoxMM, a dataset of diverse real-world spoken conversations with human-annota…

Read Paper →

Physics Preprint PDF DOI

Spin-coherence characterization of boron vacancy defects in hexagonal boron nitride with broadband microwave pulses

Yuki Nakamura, Takuya Iwasaki, Shu Nakaharai, Shinichi Ogawa, Yukinori Morita, Kenji Watanabe, Takashi Taniguchi, Kento Sasaki, Kensuke Kobayashi · 2026

Negatively charged boron vacancy (VB-) defects in hexagonal boron nitride (hBN) are promising for nanoscale-proximity quantum sensing. To evaluate their performance, it is important to characterize th…

Read Paper →

Computer Science Preprint PDF DOI

WOOTdroid: Whole-system Online On-device Tracing for Android

Simon Althaus, Nikolaos Alexopoulos, Max Muhlhauser, Christian Reuter, Ephraim Zimmer · 2026

System auditing on Android faces two problems. First, existing syscall tracers lose events under load, silently overwriting entries faster than a user space reader can drain them. Second, security-rel…

Read Paper →

AI & Data Science Preprint PDF DOI

Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation

Xupeng Chen, Binbin Shi, Chenqian Le, Qifu Yin, Lang Lin, Haowei Ni, Ran Gong, Panfeng Li · 2026

Deploying vision-language models (VLMs) in clinical settings demands auditable behavior under realistic failure conditions, yet the failure landscape of frontier VLMs on specialized medical inputs is …

Read Paper →

AI & Data Science Preprint PDF DOI

Political Bias Audits of LLMs Capture Sycophancy to the Inferred Auditor

Petter Tornberg, Michelle Schimmel · 2026

Large language models (LLMs) are commonly evaluated for political bias based on their responses to fixed questionnaires, which typically place frontier models on the political left. A parallel literat…

Read Paper →

AI & Data Science Preprint PDF DOI

Mapping how LLMs debate societal issues when shadowing human personality traits, sociodemographics and social media behavior

Ali Aghazadeh Ardebili, Massimo Stella · 2026

Large Language Models (LLMs) can strongly shape social discourse, yet datasets investigating how LLM outputs vary across controlled social and contextual prompting remain sparse. Cognitive Digital Sha…

Read Paper →

Computer Science Preprint PDF DOI

Purifying Multimodal Retrieval: Fragment-Level Evidence Selection for RAG

Xihang Wang, Zihan Wang, Chengkai Huang, Cao Liu, Ke Zeng, Quan Z. Sheng, Lina Yao · 2026

Multimodal Retrieval-Augmented Generation (MRAG) is widely adopted for Multimodal Large Language Models (MLLMs) with external evidence to reduce hallucinations. Despite its success, most existing MRAG…

Read Paper →

AI & Data Science Preprint PDF DOI

AppTek Call-Center Dialogues: A Multi-Accent Long-Form Benchmark for English ASR

Eugen Beck, Sarah Beranek, Uma Moothiringote, Daniel Mann, Wilfried Michel, Katie Nguyen, Taylor Tragemann · 2026

Evaluating English ASR systems for conversational AI applications remains difficult, as many publicly available corpora are either pre-segmented into short segments, consist of read or prepared speech…

Read Paper →

Computer Science Preprint PDF DOI

Empire Amplifier: Uncovering and Contesting the Prioritization of Colonial Content on Platforms Through Community-Informed Algorithmic Auditing

Nel Escher, Bakyt Yrysov, Ashley McDermott, Daniel Chechelnitsky, Hermela Berehan Benyam, Nikola Banovic · 2026

Though online platforms claim to amplify Indigenous voices, Indigenous communities are worried that these systems are instead eroding their language and culture. We conduct a community-informed algori…

Read Paper →

Physics Preprint PDF DOI

First Detection of Faraday Rotation in a Gamma-Ray Burst Afterglow: Low Polarization and High Rotation Measure in GRB 260310A Reveal Jet Magnetic Structure and Environment

Collin T. Christy, Tanmoy Laskar, Kate D. Alexander, Noah Franz, Jonathan Granot, Ryan Chornock, Raffaella Margutti, Ramandeep Gill, Jeniveve Pearson, Edo Berger, Wen-fai Fong, Coleman Rohde, Patricia Schady · 2026

We report the detection of linear polarization in the radio afterglow of GRB 260310A, representing the first centimeter-wavelength polarization detection of a gamma-ray burst (GRB) afterglow and the f…

Read Paper →

Engineering Preprint PDF DOI

BUT System Description for CHiME-9 MCoRec Challenge

Dominik Klement, Alexander Polok, Nguyen Hai Phong, Prachi Singh, Lukas Burget · 2026

Multi-talker automatic speech recognition (ASR) in conversational recordings remains an open problem, particularly in scenarios with large portion of overlapping speech where identifying and transcrib…

Read Paper →

Engineering Preprint PDF DOI

A Knowledge-Driven Approach to Target Speech Extraction in the Presence of Background Sound Effects for Cinematic Audio Source Separation (CASS)

Chun-wei Ho, Sabato Marco Siniscalchi, Kai Li, Chin-Hui Lee · 2026

We propose a knowledge-driven approach to speech target extraction in the presence of background sound effects already recorded in cinematic audio. The specific knowledge sources studied are manners o…

Read Paper →

AI & Data Science Preprint PDF DOI

Measurement Risk in Supervised Financial NLP: Rubric and Metric Sensitivity on JF-ICR

Sidi Chang, Peiying Zhu, Yuxiao Chen, Rongdong Chai · 2026

As LLMs become credible readers of earnings calls, investor-relations Q\&A, guidance, and disclosure language, supervised financial NLP benchmarks increasingly function as decision evidence for model …

Read Paper →

AI & Data Science Preprint PDF DOI

TypeBandit: Type-Level Context Allocation and Reweighting for Effective Attribute Completion in Heterogeneous Graph Neural Networks

Ta-Yang Wang, Rajgopal Kannan, Viktor Prasanna · 2026

Heterogeneous graphs are widely used to model multi-relational systems, but missing node attributes remain a major bottleneck for downstream learning. In this paper, we identify and formalize type-dep…

Read Paper →

AI & Data Science Preprint PDF DOI

End-to-End Evaluation and Governance of an EHR-Embedded AI Agent for Clinicians

Aaryan Shah, Andrew Hines, Alexia Downs, Denis Bajet, Paulius Mui, Fabiano Araujo, Laura Offutt, Aida Rutledge, Elizabeth Jimenez · 2026

Clinical AI systems require not just point-in-time evaluation but continuous governance: the ongoing practice of monitoring, evaluating, iterating, and re-evaluating performance throughout deployment.…

Read Paper →

Browse Research Papers

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

Tur\'an-Type Extremal Results for Distance-$k$ Graphs

Are DeepFakes Realistic Enough? Exploring Semantic Mismatch as a Novel Challenge

Beyond the Baseband: Adaptive Multi-Band Encoding for Full-Spectrum Bioacoustics Classification

On Agentic Behavioral Modeling

LRS-VoxMM: A benchmark for in-the-wild audio-visual speech recognition

Spin-coherence characterization of boron vacancy defects in hexagonal boron nitride with broadband microwave pulses

WOOTdroid: Whole-system Online On-device Tracing for Android

Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation

Political Bias Audits of LLMs Capture Sycophancy to the Inferred Auditor

Mapping how LLMs debate societal issues when shadowing human personality traits, sociodemographics and social media behavior

Purifying Multimodal Retrieval: Fragment-Level Evidence Selection for RAG

AppTek Call-Center Dialogues: A Multi-Accent Long-Form Benchmark for English ASR

Empire Amplifier: Uncovering and Contesting the Prioritization of Colonial Content on Platforms Through Community-Informed Algorithmic Auditing

First Detection of Faraday Rotation in a Gamma-Ray Burst Afterglow: Low Polarization and High Rotation Measure in GRB 260310A Reveal Jet Magnetic Structure and Environment

BUT System Description for CHiME-9 MCoRec Challenge

A Knowledge-Driven Approach to Target Speech Extraction in the Presence of Background Sound Effects for Cinematic Audio Source Separation (CASS)

Measurement Risk in Supervised Financial NLP: Rubric and Metric Sensitivity on JF-ICR

TypeBandit: Type-Level Context Allocation and Reweighting for Effective Attribute Completion in Heterogeneous Graph Neural Networks

End-to-End Evaluation and Governance of an EHR-Embedded AI Agent for Clinicians

Browse by Category

Research Type

Publish Your Research