Jaime Carbonell — Research Repository

AI & Data Science Preprint PDF DOI

Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning

Jingcheng Deng, Zihao Wei, Liang Pang, Junhong Wu, Shicheng Xu, Zenghao Duan, Huawei Shen · 2026

Latent reasoning offers a more efficient alternative to explicit reasoning by compressing intermediate reasoning into continuous representations and substantially shortening reasoning chains. However,…

Read Paper →

AI & Data Science Preprint PDF DOI

SciEval: A Benchmark for Automatic Evaluation of K-12 Science Instructional Materials

Zhaohui Li, Peng He, Zhiyuan Chen, Honglu Liu, Zeyuan Wang, Tingting Li, Jinjun Xiong · 2026

The need to evaluate instructional materials for K-12 science education has become increasingly important, as more educators use generative AI to create instructional materials. However, the review of…

Read Paper →

Computer Science Preprint PDF DOI

Characterizing Streaming Decidability of CSPs via Non-Redundancy

Amatya Sharma, Santhoshini Velusamy · 2026

We study the single-pass streaming complexity of deciding satisfiability of Constraint Satisfaction Problems (CSPs). A CSP is specified by a constraint language $\Gamma$, that is, a finite set of $k$-…

Read Paper →

AI & Data Science Preprint PDF DOI

Process Supervision via Verbal Critique Improves Reasoning in Large Language Models

Hao-Yuan Chen · 2026

Inference-time scaling for LLM reasoning has focused on three axes: chain depth, sample breadth, and learned step-scorers (PRMs). We introduce a fourth axis, granularity of external verbal supervision…

Read Paper →

AI & Data Science Preprint PDF DOI

TRACES: Tagging Reasoning Steps for Adaptive Cost-Efficient Early-Stopping

Yannis Belkhiter, Seshu Tirupathi, Giulio Zizzo, John D. Kelleher · 2026

The field of Language Reasoning Models (LRMs) has been very active over the past few years with advances in training and inference techniques enabling LRMs to reason longer, and more accurately. Howev…

Read Paper →

AI & Data Science Preprint PDF DOI

TEMPO: Scaling Test-time Training for Large Reasoning Models

Qingyang Zhang, Xinke Kong, Haitao Wu, Qinghua Hu, Minghao Wu, Baosong Yang, Yu Cheng, Yun Luo, Ganqu Cui, Changqing Zhang · 2026

Test-time training (TTT) adapts model parameters on unlabeled test instances during inference time, which continuously extends capabilities beyond the reach of offline training. Despite initial gains,…

Read Paper →

AI & Data Science Preprint PDF DOI

Neural Garbage Collection: Learning to Forget while Learning to Reason

Michael Y. Li, Jubayer Ibn Hamid, Emily B. Fox, Noah D. Goodman · 2026

Chain-of-thought reasoning has driven striking advances in language model capability, yet every reasoning step grows the KV cache, creating a bottleneck to scaling this paradigm further. Current appro…

Read Paper →

AI & Data Science Preprint PDF DOI

MoE-nD: Per-Layer Mixture-of-Experts Routing for Multi-Axis KV Cache Compression

Libo Sun, Peixiong He, Po-Wei Harn, Xiao Qin · 2026

KV cache memory is the dominant bottleneck for long-context LLM inference. Existing compression methods each act on a single axis of the four-dimensional KV tensor -- token eviction (sequence), quanti…

Read Paper →

Physics Preprint PDF DOI

Electron-Impact Quasi-Resonant Ion-Pair Dissociation of OCS: A Velocity Slice Imaging Study with Partial Wave Analysis

Narayan Kundu, Soumya Ghosh, Dhananjay Nandi · 2026

We present velocity map imaging data on intramolecular ion-pair dissociation (IPD) of carbonyl sulfide (OCS) induced by electron impact over the 20 eV to 45 eV energy range. Two distinct IPD pathways …

Read Paper →

AI & Data Science Preprint PDF DOI

TIP: Token Importance in On-Policy Distillation

Yuanda Xu, Hejian Sang, Zhengze Zhou, Ran He, Zhipeng Wang, Alborz Geramifard · 2026

On-policy knowledge distillation (OPD) trains a student on its own rollouts under token-level supervision from a teacher. Not all token positions matter equally, but existing views of token importance…

Read Paper →

AI & Data Science Preprint PDF DOI

Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation

Yecheng Wu, Song Han, Hai Cai · 2026

On-policy distillation (OPD) has emerged as an efficient post-training paradigm for large language models. However, standard OPD requires a live teacher inference server throughout training, resulting…

Read Paper →

AI & Data Science Preprint PDF DOI

Introspective Diffusion Language Models

Yifan Yu, Yuqing Jian, Junxiong Wang, Zhongzhu Zhou, Donglin Zhuang, Xinyu Fang, Sri Yanamandra, Xiaoxia Wu, Qingyang Wu, Shuaiwen Leon Song, Tri Dao, Ben Athiwaratkun, James Zou, Fan Lai, Chenfeng Xu · 2026

Diffusion language models promise parallel generation, yet still lag behind autoregressive (AR) models in quality. We stem this gap to a failure of introspective consistency: AR models agree with thei…

Read Paper →

AI & Data Science Preprint PDF DOI

$p1$: Better Prompt Optimization with Fewer Prompts

Zhaolin Gao, Yu (Sid) Wang, Bo Liu, Thorsten Joachims, Kiante Brantley, Wen Sun · 2026

Prompt optimization improves language models without updating their weights by searching for a better system prompt, but its effectiveness varies widely across tasks. We study what makes a task amenab…

Read Paper →

AI & Data Science Preprint PDF DOI

Mitigating Distribution Sharpening in Math RLVR via Distribution-Aligned Hint Synthesis and Backward Hint Annealing

Pei-Xi Xie, Che-Yu Lin, Cheng-Lin Yang · 2026

Reinforcement learning with verifiable rewards (RLVR) can improve low-$k$ reasoning accuracy while narrowing solution coverage on challenging math questions, and pass@1 gains do not necessarily transl…

Read Paper →

AI & Data Science Preprint PDF DOI

Squeeze Evolve: Unified Multi-Model Orchestration for Verifier-Free Evolution

Monishwaran Maheswaran, Leon Lakhani, Zhongzhu Zhou, Shijia Yang, Junxiong Wang, Coleman Hooper, Yuezhou Hu, Rishabh Tiwari, Jue Wang, Harman Singh, Qingyang Wu, Yuqing Jian, Ce Zhang, Kurt Keutzer, Tri Dao, Xiaoxia Wu, Ben Athiwaratkun, James Zou, Chenfeng Xu · 2026

We show that verifier-free evolution is bottlenecked by both diversity and efficiency: without external correction, repeated evolution accelerates collapse toward narrow modes, while the uniform use o…

Read Paper →

AI & Data Science Preprint PDF DOI

Apriel-1.5-OpenReasoner: RL Post-Training for General-Purpose and Efficient Reasoning

Rafael Pardinas, Ehsan Kamalloo, David Vazquez, Alexandre Drouin · 2026

Building general-purpose reasoning models using reinforcement learning with verifiable rewards (RLVR) across diverse domains has been widely adopted by frontier open-weight models. However, their trai…

Read Paper →

AI & Data Science Preprint PDF DOI

ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement

Difan Jiao, Qianfeng Wen, Blair Yang, Zhenwei Tang, Ashton Anderson · 2026

We introduce ThinkTwice, a simple two-phase framework that jointly optimizes LLMs to solve reasoning problems and refine the answers, based on Group Relative Policy Optimization (GRPO). In each pair o…

Read Paper →

AI & Data Science Preprint PDF DOI

Avoiding Overthinking and Underthinking: Curriculum-Aware Budget Scheduling for LLMs

Amirul Rahman, Aisha Karim, Kenji Nakamura, Yi-Fan Ng · 2026

Scaling test-time compute via extended reasoning has become a key paradigm for improving the capabilities of large language models (LLMs). However, existing approaches optimize reasoning under fixed o…

Read Paper →

Physics Preprint PDF DOI

Surface mechanisms governing long-term stability of GEM detectors in CO$_2$-based gaseous mixtures

Tiago F. Silva, Thiago B. Saramela, Willian W.R.A. da Silva, Camilla de S. Codeco, Maria do C. M. Alves, Jonder Morais, Niklaus U. Wetter, Anderson Z. de Freitas · 2026

Understanding the chemical stability of Gas Electron Multipliers (GEMs) operated in CO$_2$-based mixtures is essential for improving detector longevity and reliability. In this work, we investigate th…

Read Paper →

AI & Data Science Preprint PDF DOI

Robust Reasoning Benchmark

Pavel Golikov, Evgenii Opryshko, Gennady Pekhimenko, Mark C. Jeffrey · 2026

While Large Language Models (LLMs) achieve high performance on standard mathematical benchmarks, their underlying reasoning processes remain highly overfit to standard textual formatting. We propose a…

Read Paper →

Browse Research Papers

Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning

SciEval: A Benchmark for Automatic Evaluation of K-12 Science Instructional Materials

Characterizing Streaming Decidability of CSPs via Non-Redundancy

Process Supervision via Verbal Critique Improves Reasoning in Large Language Models

TRACES: Tagging Reasoning Steps for Adaptive Cost-Efficient Early-Stopping

TEMPO: Scaling Test-time Training for Large Reasoning Models

Neural Garbage Collection: Learning to Forget while Learning to Reason

MoE-nD: Per-Layer Mixture-of-Experts Routing for Multi-Axis KV Cache Compression

Electron-Impact Quasi-Resonant Ion-Pair Dissociation of OCS: A Velocity Slice Imaging Study with Partial Wave Analysis

TIP: Token Importance in On-Policy Distillation

Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation

Introspective Diffusion Language Models

$p1$: Better Prompt Optimization with Fewer Prompts

Mitigating Distribution Sharpening in Math RLVR via Distribution-Aligned Hint Synthesis and Backward Hint Annealing

Squeeze Evolve: Unified Multi-Model Orchestration for Verifier-Free Evolution

Apriel-1.5-OpenReasoner: RL Post-Training for General-Purpose and Efficient Reasoning

ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement

Avoiding Overthinking and Underthinking: Curriculum-Aware Budget Scheduling for LLMs

Surface mechanisms governing long-term stability of GEM detectors in CO$_2$-based gaseous mixtures

Robust Reasoning Benchmark

Browse by Category

Research Type

Publish Your Research