Programming Languages · AI & Data Science · Preprint — Research Repository

AI & Data Science Preprint PDF DOI

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

Xin Zhou, Dingkang Liang, Xiwu Chen, Feiyang Tan, Dingyuan Zhang, Hengshuang Zhao, Xiang Bai · 2026

Driving world models serve as a pivotal technology for autonomous driving by simulating environmental dynamics. However, existing approaches predominantly focus on future scene generation, often overl…

Read Paper →

AI & Data Science Preprint PDF DOI

Exploration Hacking: Can LLMs Learn to Resist RL Training?

Eyon Jang, Damon Falck, Joschka Braun, Nathalie Kirch, Achu Menon, Perusha Moodley, Scott Emmons, Roland S. Zimmermann, David Lindner · 2026

Reinforcement learning (RL) has become essential to the post-training of large language models (LLMs) for reasoning, agentic capabilities and alignment. Successful RL relies on sufficient exploration …

Read Paper →

AI & Data Science Preprint PDF DOI

LLM as Clinical Graph Structure Refiner: Enhancing Representation Learning in EEG Seizure Diagnosis

Lincan Li, Zheng Chen, Yushun Dong · 2026

Electroencephalogram (EEG) signals are vital for automated seizure detection, but their inherent noise makes robust representation learning challenging. Existing graph construction methods, whether co…

Read Paper →

AI & Data Science Preprint PDF DOI

AEGIS: A Holistic Benchmark for Evaluating Forensic Analysis of AI-Generated Academic Images

Bo Zhang, Tzu-Yen Ma, Zichen Tang, Junpeng Ding, Zirui Wang, Yizhuo Zhao, Peilin Gao, Zijie Xi, Zixin Ding, Haiyang Sun, Haocheng Gao, Yuan Liu, Liangjia Wang, Yiling Huang, Yujie Wang, Yuyue Zhang, Ronghui Xi, Yuanze Li, Jiacheng Liu, Zhongjun Yang, Haihong E · 2026

We introduce AEGIS, A holistic benchmark for Evaluating forensic analysis of AI-Generated academic ImageS. Compared to existing benchmarks, AEGIS features three key advances: (1) Domain-Specific Compl…

Read Paper →

AI & Data Science Preprint PDF DOI

PhyCo: Learning Controllable Physical Priors for Generative Motion

Sriram Narayanan, Ziyu Jiang, Srinivasa Narasimhan, Manmohan Chandraker · 2026

Modern video diffusion models excel at appearance synthesis but still struggle with physical consistency: objects drift, collisions lack realistic rebound, and material responses seldom match their un…

Read Paper →

AI & Data Science Preprint PDF DOI

On the Proper Treatment of Units in Surprisal Theory

Samuel Kiegeland, Vesteinn Sn{ae}bjarnarson, Tim Vieira, Ryan Cotterell · 2026

Surprisal theory links human processing effort to the predictability of an upcoming linguistic unit, but empirical work often leaves the notion of a unit underspecified. In practice, experimental stim…

Read Paper →

AI & Data Science Preprint PDF DOI

Normativity and Productivism: Ableist Intelligence? A Degrowth Analysis of AI Sign Language Translation Tools for Deaf People

Nina Seron-Abouelfadil, Poppy Fynes · 2026

Sign languages, of any geographical or accentual variation, understandably face continuous scrutiny under the ever present popularity of verbal dictation and audism. Through this, many potential probl…

Read Paper →

AI & Data Science Preprint PDF DOI

What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation Design

Ivan Bercovich · 2026

Terminal-agent benchmarks have become a primary signal for measuring the coding and system-administration capabilities of large language models. As the market for evaluation environments grows, so doe…

Read Paper →

AI & Data Science Preprint PDF DOI

Characterizing the Consistency of the Emergent Misalignment Persona

Anietta Weckauff, Yuchen Zhang, Maksym Andriushchenko · 2026

Fine-tuning large language models (LLMs) on narrowly misaligned data generalizes to broadly misaligned behavior, a phenomenon termed emergent misalignment (EM). While prior work has found a correlatio…

Read Paper →

AI & Data Science Preprint PDF DOI

TopBench: A Benchmark for Implicit Prediction and Reasoning over Tabular Question Answering

An-Yang Ji, Jun-Peng Jiang, De-Chuan Zhan, Han-Jia Ye · 2026

Large Language Models (LLMs) have advanced Table Question Answering, where most queries can be answered by extracting information or simple aggregation. However, a common class of real-world queries i…

Read Paper →

AI & Data Science Preprint PDF DOI

Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling

Ansar Aynetdinov, Patrick Haller, Alan Akbik · 2026

Recent research has shown that filtering massive English web corpora into high-quality subsets significantly improves training efficiency. However, for high-resource non-English languages like German,…

Read Paper →

AI & Data Science Preprint PDF DOI

RHyVE: Competence-Aware Verification and Phase-Aware Deployment for LLM-Generated Reward Hypotheses

Feiyu Wu, Xu Zheng, Zhuocheng Wang, Yi ming Dai, Hui Li · 2026

Large language models (LLMs) make reward design in reinforcement learning substantially more scalable, but generated rewards are not automatically reliable training objectives. Existing work has focus…

Read Paper →

AI & Data Science Preprint PDF DOI

Agent-Agnostic Evaluation of SQL Accuracy in Production Text-to-SQL Systems

Taslim Jamal Arif, Kuldeep Singh · 2026

Text-to-SQL (T2SQL) evaluation in production environments poses fundamental challenges that existing benchmarks do not address. Current evaluation methodologies whether rule-based SQL matching or sche…

Read Paper →

AI & Data Science Preprint PDF DOI

Stable Behavior, Limited Variation: Persona Validity in LLM Agents for Urban Sentiment Perception

Neemias B da Silva, Rodrigo Minetto, Daniel Silver, Thiago H Silva · 2026

Large Language Models (LLMs) are increasingly used as proxies for human perception in urban analysis, yet it remains unclear whether persona prompting produces meaningful and reproducible behavioral d…

Read Paper →

AI & Data Science Preprint PDF DOI

Collaborative Agent Reasoning Engineering (CARE): A Three-Party Design Methodology for Systematically Engineering AI Agents with Subject Matter Experts, Developers, and Helper Agents

Rahul Ramachandran, Nidhi Jha, Muthukumaran Ramasubramanian · 2026

We present Collaborative Agent Reasoning Engineering (CARE), a disciplined methodology for engineering Large Language Model (LLM) agents in scientific domains. Unlike ad-hoc trial-and-error approaches…

Read Paper →

AI & Data Science Preprint PDF DOI

SpecVQA: A Benchmark for Spectral Understanding and Visual Question Answering in Scientific Images

Jialu Shen, Han Lyu, Suyang Zhong, Hanzheng Li, Haoyi Tao, Nan Wang, Changhong Chen, Xi Fang · 2026

Spectra are a prevalent yet highly information-dense form of scientific imagery, presenting substantial challenges to multimodal large language models (MLLMs) due to their unstructured and domain-spec…

Read Paper →

AI & Data Science Preprint PDF DOI

Ease of dependency distance minimization in star-like structures

Emilia Garcia-Casademont, Ramon Ferrer-i-Cancho · 2026

The syntactic structure of a sentence can be represented as a tree where edges indicate syntactic dependencies between words. When that structure is a star, it has been demonstrated that the head shou…

Read Paper →

AI & Data Science Preprint PDF DOI

Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation

Garvin Kruthof · 2026

When researchers iteratively refine ideas with large language models, do the models preserve fidelity to the original objective? We introduce DriftBench, a benchmark for evaluating constraint adherenc…

Read Paper →

AI & Data Science Preprint PDF DOI

Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding

Smit Jivani, Sarvam Maheshwari, Sunita Sarawagi · 2026

Large language models (LLMs) have revolutionized Text-to-SQL generation, allowing users to query structured data using natural language with growing ease. Yet, real-world deployment remains challengin…

Read Paper →

AI & Data Science Preprint PDF DOI

Cost-Aware Learning

Clara Mohri, Amir Globerson, Haim Kaplan, Tomer Koren, Yishay Mansour · 2026

We consider the problem of Cost-Aware Learning, where sampling different component functions of a finite-sum objective incurs different costs. The objective is to reach a target error while minimizing…

Read Paper →

Browse Research Papers

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

Exploration Hacking: Can LLMs Learn to Resist RL Training?

LLM as Clinical Graph Structure Refiner: Enhancing Representation Learning in EEG Seizure Diagnosis

AEGIS: A Holistic Benchmark for Evaluating Forensic Analysis of AI-Generated Academic Images

PhyCo: Learning Controllable Physical Priors for Generative Motion

On the Proper Treatment of Units in Surprisal Theory

Normativity and Productivism: Ableist Intelligence? A Degrowth Analysis of AI Sign Language Translation Tools for Deaf People

What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation Design

Characterizing the Consistency of the Emergent Misalignment Persona

TopBench: A Benchmark for Implicit Prediction and Reasoning over Tabular Question Answering

Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling

RHyVE: Competence-Aware Verification and Phase-Aware Deployment for LLM-Generated Reward Hypotheses

Agent-Agnostic Evaluation of SQL Accuracy in Production Text-to-SQL Systems

Stable Behavior, Limited Variation: Persona Validity in LLM Agents for Urban Sentiment Perception

Collaborative Agent Reasoning Engineering (CARE): A Three-Party Design Methodology for Systematically Engineering AI Agents with Subject Matter Experts, Developers, and Helper Agents

SpecVQA: A Benchmark for Spectral Understanding and Visual Question Answering in Scientific Images

Ease of dependency distance minimization in star-like structures

Models Recall What They Violate: Constraint Adherence in Multi-Turn LLM Ideation

Reliable Answers for Recurring Questions: Boosting Text-to-SQL Accuracy with Template Constrained Decoding

Cost-Aware Learning

Browse by Category

Research Type

Publish Your Research