Behavior in AI & Data Science — Research Repository

AI & Data Science Preprint PDF DOI

Exploration Hacking: Can LLMs Learn to Resist RL Training?

Eyon Jang, Damon Falck, Joschka Braun, Nathalie Kirch, Achu Menon, Perusha Moodley, Scott Emmons, Roland S. Zimmermann, David Lindner · 2026

Reinforcement learning (RL) has become essential to the post-training of large language models (LLMs) for reasoning, agentic capabilities and alignment. Successful RL relies on sufficient exploration …

Read Paper →

AI & Data Science Preprint PDF DOI

Action Motifs: Self-Supervised Hierarchical Representation of Human Body Movements

Genki Kinoshita, Shu Nakamura, Ryo Kawahara, Shohei Nobuhara, Yasutomo Kawanishi, Ko Nishino · 2026

Effective human behavior modeling requires a representation of the human body movement that capitalizes on its compositionality. We propose a hierarchical representation consisting of Action Atoms tha…

Read Paper →

AI & Data Science Preprint PDF DOI

Kernel-based independence and mean independence tests for weakly dependent data

Daniel Diz-Castro, Manuel Febrero-Bande, Wenceslao Gonzalez-Manteiga · 2026

We provide a unified framework for independence and mean independence tests based on the Hilbert-Schmidt independence criterion, extending some previous results in the literature to hold in general to…

Read Paper →

AI & Data Science Preprint PDF DOI

Characterizing the Consistency of the Emergent Misalignment Persona

Anietta Weckauff, Yuchen Zhang, Maksym Andriushchenko · 2026

Fine-tuning large language models (LLMs) on narrowly misaligned data generalizes to broadly misaligned behavior, a phenomenon termed emergent misalignment (EM). While prior work has found a correlatio…

Read Paper →

AI & Data Science Preprint PDF DOI

TopBench: A Benchmark for Implicit Prediction and Reasoning over Tabular Question Answering

An-Yang Ji, Jun-Peng Jiang, De-Chuan Zhan, Han-Jia Ye · 2026

Large Language Models (LLMs) have advanced Table Question Answering, where most queries can be answered by extracting information or simple aggregation. However, a common class of real-world queries i…

Read Paper →

AI & Data Science Preprint PDF DOI

RHyVE: Competence-Aware Verification and Phase-Aware Deployment for LLM-Generated Reward Hypotheses

Feiyu Wu, Xu Zheng, Zhuocheng Wang, Yi ming Dai, Hui Li · 2026

Large language models (LLMs) make reward design in reinforcement learning substantially more scalable, but generated rewards are not automatically reliable training objectives. Existing work has focus…

Read Paper →

AI & Data Science Preprint PDF DOI

Stable Behavior, Limited Variation: Persona Validity in LLM Agents for Urban Sentiment Perception

Neemias B da Silva, Rodrigo Minetto, Daniel Silver, Thiago H Silva · 2026

Large Language Models (LLMs) are increasingly used as proxies for human perception in urban analysis, yet it remains unclear whether persona prompting produces meaningful and reproducible behavioral d…

Read Paper →

AI & Data Science Preprint PDF DOI

Collaborative Agent Reasoning Engineering (CARE): A Three-Party Design Methodology for Systematically Engineering AI Agents with Subject Matter Experts, Developers, and Helper Agents

Rahul Ramachandran, Nidhi Jha, Muthukumaran Ramasubramanian · 2026

We present Collaborative Agent Reasoning Engineering (CARE), a disciplined methodology for engineering Large Language Model (LLM) agents in scientific domains. Unlike ad-hoc trial-and-error approaches…

Read Paper →

AI & Data Science Preprint PDF DOI

Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation

Garvin Kruthof · 2026

When researchers iteratively refine ideas with large language models, do the models preserve fidelity to the original objective? We introduce DriftBench, a benchmark for evaluating constraint adherenc…

Read Paper →

AI & Data Science Preprint PDF DOI

Echo-{\alpha}: Large Agentic Multimodal Reasoning Model for Ultrasound Interpretation

Jing Zhang, Wentao Jiang, Tao Huang, Zhiwei Wang, Jianxin Liu, Jian Chen, Ping Ye, Gang Wang, Zengmao Wang, Bo Du, Dacheng Tao · 2026

Ultrasound interpretation requires both precise lesion localization and holistic clinical reasoning, yet existing methods typically excel at only one of these capabilities: specialized detectors offer…

Read Paper →

AI & Data Science Preprint PDF DOI

The Effects of Visual Priming on Cooperative Behavior in Vision-Language Models

Kenneth J. K. Ong · 2026

As Vision-Language Models (VLMs) become increasingly integrated into decision-making systems, it is essential to understand how visual inputs influence their behavior. This paper investigates the effe…

Read Paper →

AI & Data Science Preprint PDF DOI

A Collective Variational Principle Unifying Bayesian Inference, Game Theory, and Thermodynamics

Djamel Bouchaffra, Faycal Ykhlef, Mustapha Lebbah, Hanane Azzag · 2026

Collective intelligence emerges across biological, physical, and artificial systems without central coordination, yet a unifying principle governing such behaviour remains elusive. The Free Energy Pri…

Read Paper →

AI & Data Science Preprint PDF DOI

Geometry-Calibrated Conformal Abstention for Language Models

Rui Xu, Yi Chen, Sihong Xie, Hui Xiong · 2026

When language models lack relevant knowledge for a given query, they frequently generate plausible responses that can be hallucinations, rather than admitting being agnostic about the answer. Retraini…

Read Paper →

AI & Data Science Preprint PDF DOI

Simulating clinical interventions with a generative multimodal model of human physiology

Guy Lutsker, Gal Sapir, Jordi Merino, Smadar Shilo, Anastasia Godneva, Eli Meirom, Shie Mannor, Hagai Rossman, Gal Chechik, Eran Segal · 2026

Understanding how human health changes over time, and why responses to interventions vary between individuals, remains a central challenge in medicine. Here we present HealthFormer, a decoder-only tra…

Read Paper →

AI & Data Science Preprint PDF DOI

Modeling Clinical Concern Trajectories in Language Model Agents

Sukesh Subaharan, Venkatesan VS, Murugadasan P, Sivakumar D, Gautham N, Ganeshkumar M · 2026

Large language model (LLM) agents deployed in clinical settings often exhibit abrupt, threshold-driven behavior, offering little visibility into accumulating risk prior to escalation. In real-world ca…

Read Paper →

AI & Data Science Preprint PDF DOI

A Grid-Aware Agent-Based Model for Analyzing Electric Vehicle Charging Systems

Khalil Al-Rahman Youssefi, Marija Gojkovic, Walter Stefanutti, Mika Auer, Melanie Schranz · 2026

This paper presents a configurable, grid-aware Agent-Based Model (ABM) for the systematic analysis of electric vehicle (EV) charging systems under configurable infrastructure and operational condition…

Read Paper →

AI & Data Science Preprint PDF DOI

MCPHunt: An Evaluation Framework for Cross-Boundary Data Propagation in Multi-Server MCP Agents

Haonan Li, Tianjun Sun, Yongqing Wang, Qisheng Zhang · 2026

Multi-server MCP agents create an information-flow control problem: faithful tool composition can turn individually benign read/write permissions into cross-boundary credential propagation -- a struct…

Read Paper →

AI & Data Science Preprint PDF DOI

Focus Session: Autonomous Systems Dependability in the era of AI: Design Challenges in Safety, Security, Reliability and Certification

Behnaz Ranjbar, Kirankumar Raveendiran, Sudeep Pasricha, Samarjit Chakraborty, Cecilia Carbonelli, Akash Kumar · 2026

The design of embedded safety-critical systems such as those used in next-generation automotive and autonomous platforms, is increasingly challenged by escalating system complexity, hardware-software …

Read Paper →

AI & Data Science Preprint PDF DOI

Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation

Xupeng Chen, Binbin Shi, Chenqian Le, Qifu Yin, Lang Lin, Haowei Ni, Ran Gong, Panfeng Li · 2026

Deploying vision-language models (VLMs) in clinical settings demands auditable behavior under realistic failure conditions, yet the failure landscape of frontier VLMs on specialized medical inputs is …

Read Paper →

AI & Data Science Preprint PDF DOI

Bridging Values and Behavior: A Hierarchical Framework for Proactive Embodied Agents

Chunhui Zhang, Yuxuan Wang, Aoyang Qin, Yi-Long Lu, Kunlun Wu, Yizhou Wang, Wei Wang · 2026

Current embodied agents are often limited to passive instruction-following or reactive need-satisfaction, lacking a stable, high-order value framework essential for long-term, self-directed behavior a…

Read Paper →

Browse Research Papers

Exploration Hacking: Can LLMs Learn to Resist RL Training?

Action Motifs: Self-Supervised Hierarchical Representation of Human Body Movements

Kernel-based independence and mean independence tests for weakly dependent data

Characterizing the Consistency of the Emergent Misalignment Persona

TopBench: A Benchmark for Implicit Prediction and Reasoning over Tabular Question Answering

RHyVE: Competence-Aware Verification and Phase-Aware Deployment for LLM-Generated Reward Hypotheses

Stable Behavior, Limited Variation: Persona Validity in LLM Agents for Urban Sentiment Perception

Collaborative Agent Reasoning Engineering (CARE): A Three-Party Design Methodology for Systematically Engineering AI Agents with Subject Matter Experts, Developers, and Helper Agents

Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation

Echo-{\alpha}: Large Agentic Multimodal Reasoning Model for Ultrasound Interpretation

The Effects of Visual Priming on Cooperative Behavior in Vision-Language Models

A Collective Variational Principle Unifying Bayesian Inference, Game Theory, and Thermodynamics

Geometry-Calibrated Conformal Abstention for Language Models

Simulating clinical interventions with a generative multimodal model of human physiology

Modeling Clinical Concern Trajectories in Language Model Agents

A Grid-Aware Agent-Based Model for Analyzing Electric Vehicle Charging Systems

MCPHunt: An Evaluation Framework for Cross-Boundary Data Propagation in Multi-Server MCP Agents

Focus Session: Autonomous Systems Dependability in the era of AI: Design Challenges in Safety, Security, Reliability and Certification

Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation

Bridging Values and Behavior: A Hierarchical Framework for Proactive Embodied Agents

Browse by Category

Research Type

Publish Your Research