Claudia Clopath — Research Repository

AI & Data Science Preprint PDF DOI

KellyBench: A Benchmark for Long-Horizon Sequential Decision Making

Thomas Grady, Kip Parker, Iliyan Zarov, Henry Course, Chengxi Taylor, Ross Taylor · 2026

Language models are saturating benchmarks for procedural tasks with narrow objectives. But they are increasingly being deployed in long-horizon, non-stationary environments with open-ended goals. In t…

Read Paper →

AI & Data Science Preprint PDF DOI

The TEA Nets framework combines AI and cognitive network science to model targets, events and actors in text

Sebastiano Franchini, Alexis Carrillo, Edoardo Sebastiano De Duro, Riccardo Improta, Ali Aghazadeh Ardebili, Massimo Stella · 2026

We introduce Target-Event-Agent Networks (TEA Nets) as a computational framework to extract subjects (``Agents"), verbs (``Events"), and objects (``Targets") from texts. Grounded in cognitive network …

Read Paper →

AI & Data Science Preprint PDF DOI

Entropy of Ukrainian

Anton Lavreniuk, Mykyta Mudryi, Markiian Chaklosh · 2026

In natural language processing, the entropy of a language is a measure of its unpredictability and complexity. The first study on this subject was conducted by Claude Shannon in 1951. By having partic…

Read Paper →

AI & Data Science Preprint PDF DOI

The Inverse-Wisdom Law: Architectural Tribalism and the Consensus Paradox in Agentic Swarms

Dahlia Shehata, Ming Li · 2026

As AI transitions toward multi-agent systems (MAS) to solve complex workflows, research paradigms operate on the axiomatic assumption that agent collaboration mirrors the "Wisdom of the Crowd". We cha…

Read Paper →

AI & Data Science Preprint PDF DOI

When Roles Fail: Epistemic Constraints on Advocate Role Fidelity in LLM-Based Political Statement Analysis

Juergen Dietrich · 2026

Democratic discourse analysis systems increasingly rely on multi-agent LLM pipelines in which distinct evaluator models are assigned adversarial roles to generate structured, multi-perspective assessm…

Read Paper →

AI & Data Science Preprint PDF DOI

Cross-Lingual Response Consistency in Large Language Models: An ILR-Informed Evaluation of Claude Across Six Languages

Camelia Baluta · 2026

This paper introduces a systematic evaluation framework grounded in the Interagency Language Roundtable (ILR) Skill Level Descriptions and applies it to Claude (Sonnet 4.6) across six languages: Engli…

Read Paper →

Computer Science Preprint PDF DOI

Transferability of Token Usage Rights: A Design Space Analysis of Generative AI Services

Jaeyong Lee, Heeju Kang, Ahra Cho, Baek Eunkyung · 2026

With the rapid spread of generative AI services, the token has gained value not only as a technical unit of language processing but also as an economic currency for accessing AI services. Major AI mod…

Read Paper →

AI & Data Science Preprint PDF DOI

DSIPA: Detecting LLM-Generated Texts via Sentiment-Invariant Patterns Divergence Analysis

Siyuan Li, Aodu Wulianghai, Guangyan Li, Xi Lin, Qinghua Mao, Yuliang Chen, Jun Wu, Jianhua Li · 2026

The rapid advancement of large language models (LLMs) presents new security challenges, particularly in detecting machine-generated text used for misinformation, impersonation, and content forgery. Mo…

Read Paper →

Computer Science Preprint PDF DOI

Agentic AI in the Software Development Lifecycle: Architecture, Empirical Evidence, and the Reshaping of Software Engineering

Happy Bhati · 2026

The arrival of large language models (LLMs) capable of multi-step reasoning, tool use, and long-horizon planning has produced a qualitative shift in software engineering. Where earlier code-completion…

Read Paper →

Computer Science Preprint PDF DOI

LUCid: Redefining Relevance For Lifelong Personalization

Chimaobi Okite, Anika Misra, Joyce Chai, Rada Mihalcea · 2026

Current approaches to lifelong personalization operationalize relevance through semantic proximity, causing them to miss essential user information from topically unrelated interactions. To address th…

Read Paper →

AI & Data Science Preprint PDF DOI

Semantic Layers for Reliable LLM-Powered Data Analytics: A Paired Benchmark of Accuracy and Hallucination Across Three Frontier Models

Michael Rumiantsau, Ivan Fokeev · 2026

LLMs deployed for natural-language querying of analytical databases suffer from two intertwined failures - incorrect answers and confident hallucinations - both rooted in the same cause: the model is …

Read Paper →

AI & Data Science Preprint PDF DOI

One Perturbation, Two Failure Modes: Probing VLM Safety via Embedding-Guided Typographic Perturbations

Ravikumar Balakrishnan, Sanket Mendapara · 2026

Typographic prompt injection exploits vision language models' (VLMs) ability to read text rendered in images, posing a growing threat as VLMs power autonomous agents. Prior work typically focus on max…

Read Paper →

AI & Data Science Preprint PDF DOI

Frontier Coding Agents Can Now Implement an AlphaZero Self-Play Machine Learning Pipeline For Connect Four That Performs Comparably to an External Solver

Joshua Sherwood, Ben Aybar, Benjamin Kaplan · 2026

Forecasting when AI systems will become capable of meaningfully accelerating AI research is a central challenge for AI safety. Existing benchmarks measure broad capability growth, but may not provide …

Read Paper →

AI & Data Science Preprint PDF DOI

Faithful Autoformalization via Roundtrip Verification and Repair

Daneshvar Amrollahi, Jerry Lopez, Clark Barrett · 2026

When an LLM formalizes natural language, how do we know the output is faithful? We propose a roundtrip verification approach which does not require ground-truth annotations: formalize a statement, tra…

Read Paper →

Physics Preprint PDF DOI

spectroxide: A code package for computing cosmic microwave background spectral distortions

Ethan Baker, Hongwan Liu, Siddharth Mishra-Sharma · 2026

We present spectroxide, a code package for computing cosmic microwave background spectral distortions in which all ${\sim}14{,}500$ lines of Rust code, Python interface, and ${\sim}400$ automated test…

Read Paper →

Computer Science Preprint PDF DOI

Defective Task Descriptions in LLM-Based Code Generation: Detection and Analysis

Amal Akli, Mike Papadakis, Maxime Cordy, Yves Le Traon · 2026

Large language models are widely used for code generation, yet they rely on an implicit assumption that the task descriptions are sufficiently detailed and well-formed. However, in practice, users may…

Read Paper →

AI & Data Science Preprint PDF DOI

Can Current Agents Close the Discovery-to-Application Gap? A Case Study in Minecraft

Zhou Ziheng, Huacong Tang, Jinyuan Zhang, Haowei Lin, Bangcheng Yang, Qian Long, Fang Sun, Yizhou Sun, Yitao Liang, Ying Nian Wu, Demetri Terzopoulos, Xiaofeng Gao · 2026

Discovering causal regularities and applying them to build functional systems--the discovery-to-application loop--is a hallmark of general intelligence, yet evaluating this capacity has been hindered …

Read Paper →

AI & Data Science Preprint PDF DOI

Evaluating whether AI models would sabotage AI safety research

Robert Kirk, Alexandra Souly, Kai Fronsdal, Abby D'Cruz, Xander Davies · 2026

We evaluate the propensity of frontier models to sabotage or refuse to assist with safety research when deployed as AI research agents within a frontier AI company. We apply two complementary evaluati…

Read Paper →

AI & Data Science Preprint PDF DOI

Generating Place-Based Compromises Between Two Points of View

Sumanta Bhattacharyya, Francine Chen, Scott Carter, Yan-Ying Chen, Tatiana Lau, Nayeli Suseth Bravo, Monica P. Van, Kate Sieck, Charlene C. Wu · 2026

Large Language Models (LLMs) excel academically but struggle with social intelligence tasks, such as creating good compromises. In this paper, we present methods for generating empathically neutral co…

Read Paper →

AI & Data Science Preprint PDF DOI

Beyond the Attention Stability Boundary: Agentic Self-Synthesizing Reasoning Protocols

Dahlia Shehata, Ming Li · 2026

As LLM agents transition to autonomous digital coworkers, maintaining deterministic goal-directedness in non-linear multi-turn conversations emerged as an architectural bottleneck. We identify and for…

Read Paper →

Browse Research Papers

KellyBench: A Benchmark for Long-Horizon Sequential Decision Making

The TEA Nets framework combines AI and cognitive network science to model targets, events and actors in text

Entropy of Ukrainian

The Inverse-Wisdom Law: Architectural Tribalism and the Consensus Paradox in Agentic Swarms

When Roles Fail: Epistemic Constraints on Advocate Role Fidelity in LLM-Based Political Statement Analysis

Cross-Lingual Response Consistency in Large Language Models: An ILR-Informed Evaluation of Claude Across Six Languages

Transferability of Token Usage Rights: A Design Space Analysis of Generative AI Services

DSIPA: Detecting LLM-Generated Texts via Sentiment-Invariant Patterns Divergence Analysis

Agentic AI in the Software Development Lifecycle: Architecture, Empirical Evidence, and the Reshaping of Software Engineering

LUCid: Redefining Relevance For Lifelong Personalization

Semantic Layers for Reliable LLM-Powered Data Analytics: A Paired Benchmark of Accuracy and Hallucination Across Three Frontier Models

One Perturbation, Two Failure Modes: Probing VLM Safety via Embedding-Guided Typographic Perturbations

Frontier Coding Agents Can Now Implement an AlphaZero Self-Play Machine Learning Pipeline For Connect Four That Performs Comparably to an External Solver

Faithful Autoformalization via Roundtrip Verification and Repair

spectroxide: A code package for computing cosmic microwave background spectral distortions

Defective Task Descriptions in LLM-Based Code Generation: Detection and Analysis

Can Current Agents Close the Discovery-to-Application Gap? A Case Study in Minecraft

Evaluating whether AI models would sabotage AI safety research

Generating Place-Based Compromises Between Two Points of View

Beyond the Attention Stability Boundary: Agentic Self-Synthesizing Reasoning Protocols

Browse by Category

Research Type

Publish Your Research