Ayush Tewari — Research Repository

AI & Data Science Preprint PDF DOI

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

Xin Zhou, Dingkang Liang, Xiwu Chen, Feiyang Tan, Dingyuan Zhang, Hengshuang Zhao, Xiang Bai · 2026

Driving world models serve as a pivotal technology for autonomous driving by simulating environmental dynamics. However, existing approaches predominantly focus on future scene generation, often overl…

Read Paper →

AI & Data Science Preprint PDF DOI

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

Keming Wu, Zuhao Yang, Kaichen Zhang, Shizun Wang, Haowei Zhu, Sicong Leng, Zhongyu Yang, Qijie Wang, Sudong Wang, Ziting Wang, Zili Wang, Hui Zhang, Haonan Wang, Hang Zhou, Yifan Pu, Xingxuan Li, Fangneng Zhan, Bo Li, Lidong Bing, Yuxin Song, Ziwei Liu, Wenhu Chen, Jingdong Wang, Xinchao Wang, Xiaojuan Qi, Shijian Lu, Bin Wang · 2026

Recent visual generation models have made major progress in photorealism, typography, instruction following, and interactive editing, yet they still struggle with spatial reasoning, persistent state, …

Read Paper →

Physics Preprint PDF DOI

Uniaxial strain-driven ferroelastic domain control in LaAlO3

Matthias Roeper, Robin Buschbeck, Jakob Wetzel, Tobias Ritschel, Anna-Lena Hofmann, Vladyslav Kovtunovych, Mike N. Pionteck, Javier Taboada-Gutierrez, Alexey B. Kuzmenko, Martina Basini, Vivek Unikandanunni, Iuliia Kiseleva, Jochen Geck, Susanne C. Kehr, Maximilian Lederer, Simone Sanna, Lukas M. Eng, Samuel D. Seddon · 2026

Multiferroic domain walls in functional oxides exhibit properties distinct from the bulk and are increasingly exploited as active elements in nanoelectronic and photonic devices. Deterministic control…

Read Paper →

AI & Data Science Preprint PDF DOI

PhyCo: Learning Controllable Physical Priors for Generative Motion

Sriram Narayanan, Ziyu Jiang, Srinivasa Narasimhan, Manmohan Chandraker · 2026

Modern video diffusion models excel at appearance synthesis but still struggle with physical consistency: objects drift, collisions lack realistic rebound, and material responses seldom match their un…

Read Paper →

AI & Data Science Preprint PDF DOI

PRISM: Pre-alignment via Black-box On-policy Distillation for Multimodal Reinforcement Learning

Sudong Wang, Weiquan Huang, Xiaomin Yu, Zuhao Yang, Hehai Lin, Keming Wu, Chaojun Xiao, Chen Chen, Wenxuan Wang, Beier Zhu, Yunjian Zhang, Chengwei Qin · 2026

The standard post-training recipe for large multimodal models (LMMs) applies supervised fine-tuning (SFT) on curated demonstrations followed by reinforcement learning with verifiable rewards (RLVR). H…

Read Paper →

Engineering Preprint PDF DOI

GSDrive: Reinforcing Driving Policies by Multi-mode Trajectory Probing with 3D Gaussian Splatting Environment

Ziang Guo, Min Chen, Xuefeng Zhang, Yixiao Zhou, Zufeng Zhang, Dzmitry Tsetserukou · 2026

End-to-end (E2E) autonomous driving presents a promising approach for translating perceptual inputs directly into driving actions. However, prohibitive annotation costs and temporal data quality degra…

Read Paper →

AI & Data Science Preprint PDF DOI

Auto-FlexSwitch: Efficient Dynamic Model Merging via Learnable Task Vector Compression

Junqi Gao, Dazhi Zhang, Zhichang Guo, Biqing Qi, Yi Ran, Wangmeng Zuo · 2026

Model merging has attracted attention as an effective path toward multi-task adaptation by integrating knowledge from multiple task-specific models. Among existing approaches, dynamic merging mitigate…

Read Paper →

AI & Data Science Preprint PDF DOI

What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation Design

Ivan Bercovich · 2026

Terminal-agent benchmarks have become a primary signal for measuring the coding and system-administration capabilities of large language models. As the market for evaluation environments grows, so doe…

Read Paper →

Computer Science Preprint PDF DOI

Towards Neuro-symbolic Causal Rule Synthesis, Verification, and Evaluation Grounded in Legal and Safety Principles

Zainab Rehan, Christian Medeiros Adriano, Sona Ghahremani, Holger Giese · 2026

Rule-based systems remain central in safety-critical domains but often struggle with scalability, brittleness, and goal misspecification. These limitations can lead to reward hacking and failures in f…

Read Paper →

AI & Data Science Preprint PDF DOI

AesRM: Improving Video Aesthetics with Expert-Level Feedback

Yujin Han, Yujie Wei, Yefei He, Xinyu Liu, Tianle Li, Zichao Yu, Andi Han, Shiwei Zhang, Tingyu Weng, Difan Zou · 2026

Despite rapid advances in photorealistic video generation, real-world applications such as filmmaking require video aesthetics, e.g., harmonious colors and cinematic lighting, beyond visual fidelity. …

Read Paper →

AI & Data Science Preprint PDF DOI

3D Reconstruction Techniques in the Manufacturing Domain: Applications, Research Opportunities and Use Cases

Chialoon Cheng, Kaijun liu, Zhiyang Liu, Marcelo H Ang Jr · 2026

This comprehensive review examines the evolution and the current state of the art in three-dimensional (3D) reconstruction techniques in manufacturing applications. The analysis covers both traditiona…

Read Paper →

AI & Data Science Preprint PDF DOI

RHyVE: Competence-Aware Verification and Phase-Aware Deployment for LLM-Generated Reward Hypotheses

Feiyu Wu, Xu Zheng, Zhuocheng Wang, Yi ming Dai, Hui Li · 2026

Large language models (LLMs) make reward design in reinforcement learning substantially more scalable, but generated rewards are not automatically reliable training objectives. Existing work has focus…

Read Paper →

AI & Data Science Preprint PDF DOI

SpecVQA: A Benchmark for Spectral Understanding and Visual Question Answering in Scientific Images

Jialu Shen, Han Lyu, Suyang Zhong, Hanzheng Li, Haoyi Tao, Nan Wang, Changhong Chen, Xi Fang · 2026

Spectra are a prevalent yet highly information-dense form of scientific imagery, presenting substantial challenges to multimodal large language models (MLLMs) due to their unstructured and domain-spec…

Read Paper →

AI & Data Science Preprint PDF DOI

Exponential families from a single KL identity

Marc Dymetman · 2026

Exponential families encompass the distributions central to modern machine learning -- softmax, Gaussians, and Boltzmann distributions -- and underlie the theory of variational inference, entropy-regu…

Read Paper →

AI & Data Science Preprint PDF DOI

Echo-{\alpha}: Large Agentic Multimodal Reasoning Model for Ultrasound Interpretation

Jing Zhang, Wentao Jiang, Tao Huang, Zhiwei Wang, Jianxin Liu, Jian Chen, Ping Ye, Gang Wang, Zengmao Wang, Bo Du, Dacheng Tao · 2026

Ultrasound interpretation requires both precise lesion localization and holistic clinical reasoning, yet existing methods typically excel at only one of these capabilities: specialized detectors offer…

Read Paper →

AI & Data Science Preprint PDF DOI

Learning from Disagreement: Clinician Overrides as Implicit Preference Signals for Clinical AI in Value-Based Care

Prabhjot Singh, Abhishek Gupta, Chris Betz, Abe Flansburg, Brett Ives, Sudeep Lama, Jung Hoon Son · 2026

We reframe clinician overrides of clinical AI recommendations as implicit preference data - the same signal structure exploited by reinforcement learning from human feedback (RLHF), but richer: the an…

Read Paper →

Engineering Preprint PDF DOI

Dreaming Across Towns: Semantic Rollout and Town-Adversarial Regularization for Zero-Shot Held-Out-Town Fixed-Route Driving in CARLA

Feeza Khan Khanzada, Jaerock Kwon · 2026

Learned driving agents often degrade when deployed in unseen environments. This paper studies a deliberately bounded instance of that problem in the CARLA simulator: zero-shot transfer of a closed-loo…

Read Paper →

Physics Preprint PDF DOI

International Optical Clock Comparison Using the European Optical Fiber Network

Marco Pizzocaro, Clara Zyskind, Anne Amy-Klein, Erik Benkler, Sebastien Bize, Davide Calonico, Etienne Cantin, Christian Chardonnet, Cecilia Clivati, Stefano Condio, E. Anne Curtis, Simone Donadello, Soren Dorscher, Chen-Hao Feng, Melina Filzinger, Jacques-Olivier Gaudron, Rachel M. Godun, Irene Goti, Ian R. Hill, Wei Huang, Nils Huntemann, Matthew Johnson, Joshua Klose, Jochen Kronjager, Alexander Kuhl, Rodolphe Le Targat, Filippo Levi, Burghard Lipphardt, Christian Lisdat, Jerome Lodewyck, Olivier Lopez, Helen S. Margolis, Maxime Mazouth-Laurol, Alberto Mura, Benjamin Pointard, Paul-Eric Pottie, Matias Risaro, Billy I. Robertson, Marco Schioppo, Kilian Stahl, Martin Steinel, Alexandra Tofful, Mads T{o}nnes, Jacob Tunes · 2026

Optical clocks have achieved remarkable estimated fractional frequency uncertainties reaching the $10^{-18}$ level and below, enabling applications in fundamental physics, general relativity, and geod…

Read Paper →

AI & Data Science Preprint PDF DOI

GUI Agents with Reinforcement Learning: Toward Digital Inhabitants

Junan Hu, Jian Liu, Jingxiang Lai, Jiarui Hu, Yiwei Sheng, Shuang Chen, Jian Li, Dazhao Du, Song Guo · 2026

Graphical User Interface (GUI) agents have emerged as a promising paradigm for intelligent systems that perceive and interact with graphical interfaces visually. Yet supervised fine-tuning alone canno…

Read Paper →

AI & Data Science Preprint PDF DOI

The Effects of Visual Priming on Cooperative Behavior in Vision-Language Models

Kenneth J. K. Ong · 2026

As Vision-Language Models (VLMs) become increasingly integrated into decision-making systems, it is essential to understand how visual inputs influence their behavior. This paper investigates the effe…

Read Paper →

Browse Research Papers

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

Uniaxial strain-driven ferroelastic domain control in LaAlO3

PhyCo: Learning Controllable Physical Priors for Generative Motion

PRISM: Pre-alignment via Black-box On-policy Distillation for Multimodal Reinforcement Learning

GSDrive: Reinforcing Driving Policies by Multi-mode Trajectory Probing with 3D Gaussian Splatting Environment

Auto-FlexSwitch: Efficient Dynamic Model Merging via Learnable Task Vector Compression

What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation Design

Towards Neuro-symbolic Causal Rule Synthesis, Verification, and Evaluation Grounded in Legal and Safety Principles

AesRM: Improving Video Aesthetics with Expert-Level Feedback

3D Reconstruction Techniques in the Manufacturing Domain: Applications, Research Opportunities and Use Cases

RHyVE: Competence-Aware Verification and Phase-Aware Deployment for LLM-Generated Reward Hypotheses

SpecVQA: A Benchmark for Spectral Understanding and Visual Question Answering in Scientific Images

Exponential families from a single KL identity

Echo-{\alpha}: Large Agentic Multimodal Reasoning Model for Ultrasound Interpretation

Learning from Disagreement: Clinician Overrides as Implicit Preference Signals for Clinical AI in Value-Based Care

Dreaming Across Towns: Semantic Rollout and Town-Adversarial Regularization for Zero-Shot Held-Out-Town Fixed-Route Driving in CARLA

International Optical Clock Comparison Using the European Optical Fiber Network

GUI Agents with Reinforcement Learning: Toward Digital Inhabitants

The Effects of Visual Priming on Cooperative Behavior in Vision-Language Models

Browse by Category

Research Type

Publish Your Research