Polina Kirichenko — Research Repository

AI & Data Science Preprint PDF DOI

PRISM: Pre-alignment via Black-box On-policy Distillation for Multimodal Reinforcement Learning

Sudong Wang, Weiquan Huang, Xiaomin Yu, Zuhao Yang, Hehai Lin, Keming Wu, Chaojun Xiao, Chen Chen, Wenxuan Wang, Beier Zhu, Yunjian Zhang, Chengwei Qin · 2026

The standard post-training recipe for large multimodal models (LMMs) applies supervised fine-tuning (SFT) on curated demonstrations followed by reinforcement learning with verifiable rewards (RLVR). H…

Read Paper →

AI & Data Science Preprint PDF DOI

Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning

Jingcheng Deng, Zihao Wei, Liang Pang, Junhong Wu, Shicheng Xu, Zenghao Duan, Huawei Shen · 2026

Latent reasoning offers a more efficient alternative to explicit reasoning by compressing intermediate reasoning into continuous representations and substantially shortening reasoning chains. However,…

Read Paper →

AI & Data Science Preprint PDF DOI

Knowledge Graph Representations for LLM-Based Policy Compliance Reasoning

Wilder Baldwin, Sepideh Ghanavati · 2026

The risks posed by AI features are increasing as they are rapidly integrated into software applications. In response, regulations and standards for safe and secure AI have been proposed. In this paper…

Read Paper →

Engineering Preprint PDF DOI

Can Tabular Foundation Models Guide Exploration in Robot Policy Learning?

Buqing Ou, Frederike Dumbgen · 2026

Policy optimization in high-dimensional continuous control for robotics remains a challenging problem. Predominant methods are inherently local and often require extensive tuning and carefully chosen …

Read Paper →

AI & Data Science Preprint PDF DOI

Bayesian policy gradient and actor-critic algorithms

Mohammad Ghavamzadeh, Yaakov Engel, Michal Valko · 2026

Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Conventional policy gradient methods use Monte-Carlo techn…

Read Paper →

AI & Data Science Preprint PDF DOI

APPSI-139: A Parallel Corpus of English Application Privacy Policy Summarization and Interpretation

Pengyun Zhu, Qiheng Sun, Long Wen, Yanbo Wang, Yang Cao, Junxu Liu, Deyi Xiong, Jinfei Liu, Zhibo Wang, Kui Ren · 2026

Privacy policies are essential for users to understand how service providers handle their personal data. However, these documents are often long and complex, as well as filled with technobabble and le…

Read Paper →

AI & Data Science Preprint PDF DOI

Why Mean Pooling Works: Quantifying Second-Order Collapse in Text Embeddings

Tomomasa Hara, Hiroto Kurita, Masaaki Imaizumi, Kentaro Inui, Sho Yokoi · 2026

For constructing text embeddings, mean pooling, which averages token embeddings, is the standard approach. This paper examines whether mean pooling actually works well in real models. First, we note t…

Read Paper →

AI & Data Science Preprint PDF DOI

Co-Evolving Policy Distillation

Naibin Gu, Chenxu Yang, Qingyi Si, Chuanyu Qin, Dingyu Yao, Peng Fu, Zheng Lin, Weiping Wang, Nan Duan, Jiaqi Wang · 2026

RLVR and OPD have become standard paradigms for post-training. We provide a unified analysis of these two paradigms in consolidating multiple expert capabilities into a single model, identifying capab…

Read Paper →

Mathematics Preprint PDF DOI

Deep Policy Iteration for High-Dimensional Mean-Field Games with Regenerative Reformulation

Shuixin Fang, Shupeng Wang, Zhen Wu, Hui Zhang, Tao Zhou · 2026

This paper develops a deep policy iteration method for high-dimensional finite-horizon mean-field games. We reformulate the game as a regenerative problem with deterministic cycles, which allows polic…

Read Paper →

AI & Data Science Preprint PDF DOI

Preserving Disagreement: Architectural Heterogeneity and Coherence Validation in Multi-Agent Policy Simulation

Ariel Sela · 2026

Multi-agent deliberation systems using large language models (LLMs) are increasingly proposed for policy simulation, yet they suffer from artificial consensus: evaluator agents converge on the same op…

Read Paper →

AI & Data Science Preprint PDF DOI

TLPO: Token-Level Policy Optimization for Mitigating Language Confusion in Large Language Models

Jinho Choo, JunSeung Lee, Jimyeong Kim, Yeeho Song, S. K. Hong, Yeong-Dae Kwon · 2026

Large language models (LLMs) demonstrate strong multilingual capabilities, yet often fail to consistently generate responses in the intended language, exhibiting a phenomenon known as language confusi…

Read Paper →

AI & Data Science Preprint PDF DOI

When Errors Can Be Beneficial: A Categorization of Imperfect Rewards for Policy Gradient

Shuning Shang, Hubert Strauss, Stanley Wei, Sanjeev Arora, Noam Razin · 2026

Training language models via reinforcement learning often relies on imperfect proxy rewards, since ground truth rewards that precisely define the intended behavior are rarely available. Standard metri…

Read Paper →

Engineering Preprint PDF DOI

Reference-Augmented Learning for Precise Tracking Policy of Tendon-Driven Continuum Robots

Ziqing Zou, Ke Qiu, Haojian Lu, Rong Xiong, Yue Wang · 2026

Tendon-Driven Continuum Robots (TDCRs) pose significant control challenges due to their highly nonlinear, path-dependent dynamics and non-Markovian characteristics. Traditional Jacobian-based controll…

Read Paper →

AI & Data Science Preprint PDF DOI

Sample-efficient Neuro-symbolic Proximal Policy Optimization

Simone Murari, Celeste Veronese, Daniele Meli · 2026

Deep Reinforcement Learning (DRL) algorithms often require a large amount of data and struggle in sparse-reward domains with long planning horizons and multiple sub-goals. In this paper, we propose a …

Read Paper →

Physics Preprint PDF DOI

Piezomagnetic effect of a rare-earth-based altermagnet TbPt6Al3

Ryohei Oishi, Kazunori Umeo, Takuya Aoyama, Takahiro Onimaru, Kaya Kobayashi · 2026

We have investigated the piezomagnetic (PZM) effect of the rare-earth-based g-wave altermagnet TbPt6Al3 by magnetization measurements of single-crystalline samples under uniaxial stress sigma. The mag…

Read Paper →

AI & Data Science Preprint PDF DOI

BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate

Arnon Mazza, Elad Levi · 2026

Deploying guardrails for custom policies remains challenging, as generic safety models fail to capture task-specific requirements, while prompting LLMs suffers from inconsistent boundary-case performa…

Read Paper →

Physics Preprint PDF DOI

Adaptive Sensing beyond Non-Adaptive Information Limits: End-to-End Co-Design of Geometry, Policy, and Inference

Arvin Keshvari, William Tuxbury, Zin Lin · 2026

Inverse design has made vast physical parameter spaces a substrate for emergent behavior. In sensing, the stakes of this principle are sharpest at the analog-to-digital boundary, where any information…

Read Paper →

AI & Data Science Preprint PDF DOI

Frictive Policy Optimization for LLMs: Epistemic Intervention, Risk-Sensitive Control, and Reflective Alignment

James Pustejovsky, Nikhil Krishnaswamy · 2026

We propose Frictive Policy Optimization (FPO), a framework for learning language model policies that regulate not only what to say, but when and how to intervene in order to manage epistemic and norma…

Read Paper →

Computer Science Preprint PDF DOI

Spark Policy Toolkit: Semantic Contracts and Scalable Execution for Policy Learning in Spark

Zeyu Bai · 2026

Custom policy-learning pipelines in Spark fail for two coupled systems reasons: rowwise Python execution makes inference impractical, and driver-side candidate materialization makes split search fragi…

Read Paper →

AI & Data Science Preprint PDF DOI

DPEPO: Diverse Parallel Exploration Policy Optimization for LLM-based Agents

Junshuo Zhang, Chengrui Huang, Feng Guo, Zihan Li, Ke Shi, Menghua Jiang, Jiguo Yu, Shuo Shang, Shen Gao · 2026

Large language model (LLM) agents that follow the sequential "reason-then-act" paradigm have achieved superior performance in many complex tasks.However, these methods suffer from limited exploration …

Read Paper →

Browse Research Papers

PRISM: Pre-alignment via Black-box On-policy Distillation for Multimodal Reinforcement Learning

Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning

Knowledge Graph Representations for LLM-Based Policy Compliance Reasoning

Can Tabular Foundation Models Guide Exploration in Robot Policy Learning?

Bayesian policy gradient and actor-critic algorithms

APPSI-139: A Parallel Corpus of English Application Privacy Policy Summarization and Interpretation

Why Mean Pooling Works: Quantifying Second-Order Collapse in Text Embeddings

Co-Evolving Policy Distillation

Deep Policy Iteration for High-Dimensional Mean-Field Games with Regenerative Reformulation

Preserving Disagreement: Architectural Heterogeneity and Coherence Validation in Multi-Agent Policy Simulation

TLPO: Token-Level Policy Optimization for Mitigating Language Confusion in Large Language Models

When Errors Can Be Beneficial: A Categorization of Imperfect Rewards for Policy Gradient

Reference-Augmented Learning for Precise Tracking Policy of Tendon-Driven Continuum Robots

Sample-efficient Neuro-symbolic Proximal Policy Optimization

Piezomagnetic effect of a rare-earth-based altermagnet TbPt6Al3

BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate

Adaptive Sensing beyond Non-Adaptive Information Limits: End-to-End Co-Design of Geometry, Policy, and Inference

Frictive Policy Optimization for LLMs: Epistemic Intervention, Risk-Sensitive Control, and Reflective Alignment

Spark Policy Toolkit: Semantic Contracts and Scalable Execution for Policy Learning in Spark

DPEPO: Diverse Parallel Exploration Policy Optimization for LLM-based Agents

Browse by Category

Research Type

Publish Your Research