Visual Perception — Research Repository

AI & Data Science Preprint PDF DOI

TransVLM: A Vision-Language Framework and Benchmark for Detecting Any Shot Transitions

Ce Chen, Yi Ren, Yuanming Li, Viktor Goriachko, Zhenhui Ye, Zujin Guo, Zhibin Hong, Mingming Gong · 2026

Traditional Shot Boundary Detection (SBD) inherently struggles with complex transitions by formulating the task around isolated cut points, frequently yielding corrupted video shots. We address this f…

Read Paper →

AI & Data Science Preprint PDF DOI

FineState-Bench: Benchmarking State-Conditioned Grounding for Fine-grained GUI State Setting

Fengxian Ji, Jingpu Yang, Zirui Song, Yuanxi Wang, Zhexuan Cui, Yuke Li, Qian Jiang, Xiuying Chen · 2026

Despite the rapid progress of large vision-language models (LVLMs), fine-grained, state-conditioned GUI interaction remains challenging. Current evaluations offer limited coverage, imprecise target-st…

Read Paper →

AI & Data Science Preprint PDF DOI

From LLM-Driven Trading Card Generation to Procedural Relatedness: A Pok\'emon Case Study

Johannes Pfau, Panagiotis Vrettis · 2026

Since the dawn of Trading Card Games, the genre has grown into a multi-billion-dollar industry engaging millions of analog and digital players worldwide. Popular TCGs rely on regular updates, balance …

Read Paper →

Computer Science Preprint PDF DOI

From Mirage to Grounding: Towards Reliable Multimodal Circuit-to-Verilog Code Generation

Guang Yang, Xing Hu, Xiang Chen, Xin Xi · 2026

Multimodal large language models (MLLMs) are increasingly used to translate visual artifacts into code, from UI mockups into HTML to scientific plots into Python scripts. A circuit diagram can be view…

Read Paper →

AI & Data Science Preprint PDF DOI

ClimateVID -- Social Media Videos Analysis and Challenges Involved

Shiqi Xu, Moritz Burmester, Katharina Prasse, Isaac Bravo, Stefanie Walter, Margret Keuper · 2026

The pervasive growth of digital content, specifically short videos on social media platforms, has significantly altered how topics are discussed and understood in public discourse. In this work, we ad…

Read Paper →

AI & Data Science Preprint PDF DOI

TripVVT: A Large-Scale Triplet Dataset and a Coarse-Mask Baseline for In-the-Wild Video Virtual Try-On

Dingbao Shao, Song Wu, Shenyi Wang, Ye Wang, Ziheng Tang, Fei Liu, Jiang Lin, Xinyu Chen, Qian Wang, Ying Tai, Jian Yang, Zili Yi · 2026

Due to the scarcity of large-scale in-the-wild triplet data and the improper use of masks, the performance of video virtual try-on models remains limited. In this paper, we first introduce **TripVVT-1…

Read Paper →

Computer Science Preprint PDF DOI

Real-Time Control of a Virtual Orchestra by Recognition of Conducting Gestures

Mert Mermerci, Emile Pascoe, Fredrik Edstrom, Hedvig Kjellstrom · 2026

We present a museum installation in a 180{\deg} dome theater, which gives the museum visitor the experience of conducting a symphony orchestra. We have pre-recorded a short music piece performed by a …

Read Paper →

AI & Data Science Preprint PDF DOI

The Effects of Visual Priming on Cooperative Behavior in Vision-Language Models

Kenneth J. K. Ong · 2026

As Vision-Language Models (VLMs) become increasingly integrated into decision-making systems, it is essential to understand how visual inputs influence their behavior. This paper investigates the effe…

Read Paper →

AI & Data Science Preprint PDF DOI

Training-Free Tunnel Defect Inspection and Engineering Interpretation via Visual Recalibration and Entity Reconstruction

Shipeng Liu, Liang Zhao, Dengfeng Chen, Zhanping Song · 2026

Tunnel inspection requires outputs that can support defect localization, measurement, severity grading, and engineering documentation. Existing training-free foundation-model pipelines usually stop at…

Read Paper →

Physics Preprint PDF DOI

The Large Array Survey Telescope-Pipeline. II. Image Subtraction and Transient Detection

R. Konno, E. O. Ofek, A. Krassilchtchikov, Y. Shvartzvald, S. Ben-Ami, D. Polishook, C. Tishler, E. Segre, S. Garrappa, E. A. Zimmermann, A. Horowicz, P. Chen, A. Gal-Yam, M. Engel, Y. M. Shani, S. A. Spitzer, S. Fainer, O. Yaron, A. Blumenzweig · 2026

Context. The Large Array Survey Telescope (LAST) is a wide-field visual-band survey designed to explore the variable and transient sky with high cadence. Its raw data stream is automatically processed…

Read Paper →

AI & Data Science Preprint PDF DOI

Graph World Models: Concepts, Taxonomy, and Future Directions

Jiawei Liu, Senqiao Yang, Mingjun Wang, Yu Wang, Bei Yu · 2026

As one of the mainstream models of artificial intelligence, world models allow agents to learn the representation of the environment for efficient prediction and planning. However, classical world mod…

Read Paper →

Computer Science Preprint PDF DOI

D-Rex : Diffusion Rendering for Relightable Expressive Avatars

Timo Teufel, Xilong Zhou, Umar Iqbal, Jan Kautz, Marc Habermann, Vladislav Golyanik, Christian Theobalt · 2026

We present D-Rex, a person-specific framework for photorealistic, relightable, expressive, and animatable full-body human avatars with free-viewpoint rendering. Existing methods for relightable full-b…

Read Paper →

Engineering Preprint PDF DOI

LRS-VoxMM: A benchmark for in-the-wild audio-visual speech recognition

Doyeop Kwak, Jeongsoo Choi, Suyeon Lee, Joon Son Chung · 2026

We introduce LRS-VoxMM, an in-the-wild benchmark for audio-visual speech recognition (AVSR). The benchmark is derived from VoxMM, a dataset of diverse real-world spoken conversations with human-annota…

Read Paper →

Computer Science Preprint PDF DOI

NetSatBench: A Distributed LEO Constellation Emulator with an SRv6 Case Study

Andrea Detti, Shahram Dadras, Giuseppe Tropea · 2026

NetSatBench is a distributed emulation platform for evaluating communication protocols and application workloads over large-scale LEO satellite systems. Satellites, gateways, and user terminals are im…

Read Paper →

Physics Preprint PDF DOI

Anisotropy of Satellite Galaxies-I: Contrasting Correlations with Central Galaxy, Host Halo, and Large-Scale Filament Structures

Zhuoming Zhang, Weiguang Cui, Yun Chen, Romeel Dave, Katarina Kraljic · 2026

Using the SIMBA, EAGLE, and IllustrisTNG-100 galaxy formation simulations, we examine the anisotropy of the satellite distribution and its dependencies on central galaxies, host halos, and cosmic fila…

Read Paper →

Computer Science Preprint PDF DOI

Requirements Debt in AI-Enabled Perception Systems Development: An Industrial RE4AI Perspective

Hina Saeeda, Soniya Abraham · 2026

AI integration in automotive perception systems shifts requirements from static specifications to continuously evolving entities shaped by data, models, and operating contexts. When such changes are n…

Read Paper →

AI & Data Science Preprint PDF DOI

Hyper-Dimensional Fingerprints as Molecular Representations

Jonas Teufel, Luca Torresi, Andre Eberhard, Pascal Friederich · 2026

Computational molecular representations underpin virtual screening, property prediction, and materials discovery. Conventional fingerprints are efficient and deterministic but lose structural informat…

Read Paper →

Engineering Preprint PDF DOI

MotuBrain: An Advanced World Action Model for Robot Control

MotuBrain Team, Chendong Xiang, Fan Bao, Haitian Liu, Hengkai Tan, Hongzhe Bi, James Li, Jiabao Liu, Jingrui Pang, Kiro Jing, Louis Liu, Mengchen Cai, Rongxu Cui, Ruowen Zhao, Runqing Wang, Shuhe Huang, Yao Feng, Yinze Rong, Zeyuan Wang, Jun Zhu · 2026

Vision-Language-Action (VLA) models achieve strong semantic generalization but often lack fine-grained modeling of world dynamics. Recent work explores video generation models as a foundation for worl…

Read Paper →

Physics Preprint PDF DOI

Leveraging natural fluctuations for matrix-based aberration correction in photoacoustic imaging

Yevgeny Slobodkin, Ori Katz · 2026

Photoacoustic imaging is the leading technique for deep tissue optical imaging, allowing single-shot imaging at depths. However, its resolution may be limited by acoustic aberrations, caused by natura…

Read Paper →

AI & Data Science Preprint PDF DOI

Autonomous Traffic Signal Optimization Using Digital Twin and Agentic AI for Real-Time Decision-Making

Salman Jan, Toqeer Ali Syed, Shahid Kamal, Qamar Wali, Ali Akarma · 2026

This article outlines a new framework of traffic light optimization through a digital twin of the transport infrastructure, managed by agentic AI to ensure real-time autonomous decisions. The framewor…

Read Paper →

Browse Research Papers

TransVLM: A Vision-Language Framework and Benchmark for Detecting Any Shot Transitions

FineState-Bench: Benchmarking State-Conditioned Grounding for Fine-grained GUI State Setting

From LLM-Driven Trading Card Generation to Procedural Relatedness: A Pok\'emon Case Study

From Mirage to Grounding: Towards Reliable Multimodal Circuit-to-Verilog Code Generation

ClimateVID -- Social Media Videos Analysis and Challenges Involved

TripVVT: A Large-Scale Triplet Dataset and a Coarse-Mask Baseline for In-the-Wild Video Virtual Try-On

Real-Time Control of a Virtual Orchestra by Recognition of Conducting Gestures

The Effects of Visual Priming on Cooperative Behavior in Vision-Language Models

Training-Free Tunnel Defect Inspection and Engineering Interpretation via Visual Recalibration and Entity Reconstruction

The Large Array Survey Telescope-Pipeline. II. Image Subtraction and Transient Detection

Graph World Models: Concepts, Taxonomy, and Future Directions

D-Rex : Diffusion Rendering for Relightable Expressive Avatars

LRS-VoxMM: A benchmark for in-the-wild audio-visual speech recognition

NetSatBench: A Distributed LEO Constellation Emulator with an SRv6 Case Study

Anisotropy of Satellite Galaxies-I: Contrasting Correlations with Central Galaxy, Host Halo, and Large-Scale Filament Structures

Requirements Debt in AI-Enabled Perception Systems Development: An Industrial RE4AI Perspective

Hyper-Dimensional Fingerprints as Molecular Representations

MotuBrain: An Advanced World Action Model for Robot Control

Leveraging natural fluctuations for matrix-based aberration correction in photoacoustic imaging

Autonomous Traffic Signal Optimization Using Digital Twin and Agentic AI for Real-Time Decision-Making

Browse by Category

Research Type

Publish Your Research