James Tuite in Computer Science — Research Repository

Computer Science Preprint PDF DOI

SimEval-IR: A Unified Toolkit and Benchmark Suite for Evaluating User Simulators and Search Sessions

Saber Zerhoudi · 2026

User simulators are increasingly central to interactive information retrieval, yet the community lacks standardized evaluation tools. Simulators serve two objectives, behavioral realism (matching real…

Read Paper →

Computer Science Preprint PDF DOI

WOOTdroid: Whole-system Online On-device Tracing for Android

Simon Althaus, Nikolaos Alexopoulos, Max Muhlhauser, Christian Reuter, Ephraim Zimmer · 2026

System auditing on Android faces two problems. First, existing syscall tracers lose events under load, silently overwriting entries faster than a user space reader can drain them. Second, security-rel…

Read Paper →

Computer Science Preprint PDF DOI

"It depends on where AI is used": Players' attitude patterns and evaluative logics toward different AI applications in digital games

Ting-Chen Hsu, Jiangxu Lin, Wenran Chen, Fei Qin, Zheyuan Zhang · 2026

As AI becomes increasingly embedded in digital games, players' attitudes de-pend not only on whether AI is used, but also on where and how it intervenes in gameplay. This study examines players' evalu…

Read Paper →

Computer Science Preprint PDF DOI

Test Before You Deploy: Governing Updates in the LLM Supply Chain

Mohd Sameen Chishti, Damilare Peter Oyinloye, Jingyue Li · 2026

Large Language Models (LLMs) are increasingly used as core dependencies in software systems. However, the hosted LLM services evolve continuously through provider-side updates without explicit version…

Read Paper →

Computer Science Preprint PDF DOI

How Code Representation Shapes False-Positive Dynamics in Cross-Language LLM Vulnerability Detection

Maofei Chen, Laifu Wang, Yue Qin, Yuan Wang, Bo Wu, Dongxin Liu · 2026

How code representation format shapes false positive behaviour in cross-language LLM vulnerability detection remains poorly understood. We systematically vary training intensity and code representatio…

Read Paper →

Computer Science Preprint PDF DOI

ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models

Jiasheng Zheng, Xin Zheng, Boxi Cao, Pengbo Wang, Zhengzhao Ma, Qiming Zhu, Jiazhen Jiang, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun · 2026

Code sandboxes have emerged as a critical infrastructure for advancing the coding capabilities of large language models, providing verifiable feedback for both RL training and evaluation. However, exi…

Read Paper →

Computer Science Preprint PDF DOI

Tracking Conversations: Measuring Content and Identity Exposure on AI Chatbots

Muhammad Jazlan, Ethan Wang, Yash Vekaria, Zubair Shafiq · 2026

AI chatbots are becoming a primary interface for seeking information. As their popularity grows, chatbot providers are starting to deploy advertising and analytics. Despite this, tracking on AI chatbo…

Read Paper →

Computer Science Preprint PDF DOI

REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version)

Jun Yeon Won, Xin Jin, Shiqing Ma, Zhiqiang Lin · 2026

Large Language Models (LLMs) have achieved remarkable progress in recent years, driving their adoption across a wide range of domains, including computer security. In reverse engineering, LLMs are inc…

Read Paper →

Computer Science Preprint PDF DOI

What Suppresses Nash Equilibrium Play in Large Language Models? Mechanistic Evidence and Causal Control

Paraskevas V. Lekeas, Giorgos Stamatopoulos · 2026

LLM agents are known to deviate from Nash equilibria in strategic interactions, but nobody has looked inside the model to understand why, or asked whether the deviation can be reversed. We do both. …

Read Paper →

Computer Science Preprint PDF DOI

On the Complexity of Robust Markov Decision Processes and Bisimulation Metrics

Marnix Suilen, Guillermo A. Perez · 2026

Robust Markov decision processes (RMDPs) extend standard Markov decision processes (MDPs) to account for uncertainty in the transition probabilities. RMDPs have an uncertainty set that defines a set o…

Read Paper →

Computer Science Preprint PDF DOI

Reproducible Automated Program Repair Is Hard -- Experiences With the Defects4J Dataset

Adam Krafczyk, Klaus Schmid · 2026

In the research of automated program repair (APR), benchmark datasets consisting of known defects in combination with test suites that indicate the defects are of high importance. They allow for an ev…

Read Paper →

Computer Science Preprint PDF DOI

What Makes Software Bugs Escape Testing? Evidence from a Large-Scale Empirical Study

Domenico Cotroneo, Giuseppe De Rosa, Cristina Improta, Benedetta Gaia Varriale · 2026

Understanding how software defects manifest and evolve in production environments is critical for improving reliability. While previous research has largely focused on pre-release defects, the nature …

Read Paper →

Computer Science Preprint PDF DOI

Lexical Anthropomorphization Influences on Moral Judgments of AI Bad Behavior

Jaime Banks, Nicholas David Bowman, Roman Saladino · 2026

Anthropomorphic language describing artificial intelligence (AI) is widespread in media, policy, and everyday discourse; so too are discussions of AI bad behavior, from hallucinations to inappropriate…

Read Paper →

Computer Science Preprint PDF DOI

Commit-Aware Learning-Based Test Case Prioritization for Continuous Integration

Lorenzo Abbondante, Gerardo Canfora · 2026

Regression testing in Continuous Integration (CI) pipelines is increasingly costly due to the growing size and execution frequency of test suites. Test Case Prioritization (TCP) mitigates this problem…

Read Paper →

Computer Science Preprint PDF DOI

Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation

Lanshan He, Haozhou Pang, Qi Gan, Xin Shen, Ziwei Zhang, Yibo Liu, Gang Fang, Bo Liu, Kai Sheng, Shengfeng Zeng, Chaofan Li, Zhen Hui, Keer Zhou, Lan Zhou, Shujun Dai · 2026

Cutscenes are carefully choreographed cinematic sequences embedded in video games and interactive media, serving as the primary vehicle for narrative delivery, character development, and emotional eng…

Read Paper →

Computer Science Preprint PDF DOI

Job-Scheduling Games with Time-Dependent Processing Times

Ido Borenstein, Tami Tamir · 2026

Job-scheduling games have traditionally assumed fixed processing times. However, in many realistic environments, ranging from cyber-security response to high-frequency trading, a task's duration depen…

Read Paper →

Computer Science Preprint PDF DOI

Asymmetric-Information Resource Allocation Games: An LP Approach to Purposeful Deception

Longxu Pan, Yue Guan, Daigo Shishika, Panagiotis Tsiotras · 2026

In this work, we introduce the Deceptive Resource Allocation Game (DRAG), which studies purposeful deception within a Bayesian game framework. In DRAG, a Defender allocates resources across the true a…

Read Paper →

Computer Science Preprint PDF DOI

Hierarchies of No-regret Algorithms

R. Xu, E. Yachbes, J. Zhang · 2026

Our paper studies the setting of players using no-regret algorithms in various two-player games. We address whether having stronger regret guarantees or playing against an opponent with weaker regret …

Read Paper →

Computer Science Preprint PDF DOI

Verification of Correlated Equilibria in Concurrent Reachability Games

Senthil Rajasekaran, Jean-Francois Raskin, Moshe Y. Vardi · 2026

As part of an effort to apply the rigorous guarantees of formal verification to multi-agent systems, the field of equilibrium analysis, also called rational verification, studies equilibria in multipl…

Read Paper →

Computer Science Preprint PDF DOI

Children's Online Safety Risks and Ethical Considerations in XR Games

Zinan Zhang, Xinning Gui, Yubo Kou · 2026

Emerging extended reality technologies are reshaping how children play, learn, and socialize. Yet, they also present serious safety risks. Gaming, a primary form of entertainment for children, is also…

Read Paper →

Browse Research Papers

SimEval-IR: A Unified Toolkit and Benchmark Suite for Evaluating User Simulators and Search Sessions

WOOTdroid: Whole-system Online On-device Tracing for Android

"It depends on where AI is used": Players' attitude patterns and evaluative logics toward different AI applications in digital games

Test Before You Deploy: Governing Updates in the LLM Supply Chain

How Code Representation Shapes False-Positive Dynamics in Cross-Language LLM Vulnerability Detection

ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models

Tracking Conversations: Measuring Content and Identity Exposure on AI Chatbots

REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version)

What Suppresses Nash Equilibrium Play in Large Language Models? Mechanistic Evidence and Causal Control

On the Complexity of Robust Markov Decision Processes and Bisimulation Metrics

Reproducible Automated Program Repair Is Hard -- Experiences With the Defects4J Dataset

What Makes Software Bugs Escape Testing? Evidence from a Large-Scale Empirical Study

Lexical Anthropomorphization Influences on Moral Judgments of AI Bad Behavior

Commit-Aware Learning-Based Test Case Prioritization for Continuous Integration

Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation

Job-Scheduling Games with Time-Dependent Processing Times

Asymmetric-Information Resource Allocation Games: An LP Approach to Purposeful Deception

Hierarchies of No-regret Algorithms

Verification of Correlated Equilibria in Concurrent Reachability Games

Children's Online Safety Risks and Ethical Considerations in XR Games

Browse by Category

Research Type

Publish Your Research