Visual Perception — Research Repository

AI & Data Science Preprint PDF DOI

Consumer Attitudes Towards AI in Digital Health: A Mixed-Methods Survey in Australia

Wei Zhou, Rashina Hoda, Joycelyn Ling · 2026

AI applications are increasingly being introduced into digital health. While technical performance has advanced rapidly, successful deployment mainly depends on consumer attitudes, especially to patie…

Read Paper →

Engineering Preprint PDF DOI

Connected Dependability Cage: Run-Time Function and Anomaly Monitoring for the Development and Operation of Safe Automated Vehicles

Iqra Aslam, Nour Habib, Abhishek Buragohain, Meng Zhang, Andreas Rausch, Vaibhav Tiwari, Mohamed Benchat · 2026

The advancement of automated vehicles introduces complex safety challenges, particularly in dynamic and unpredictable environments where AI-enabled perception systems must operate reliably. Ensuring c…

Read Paper →

AI & Data Science Preprint PDF DOI

Iterative Multimodal Retrieval-Augmented Generation for Medical Question Answering

Xupeng Chen, Binbin Shi, Chenqian Le, Jiaqi Zhang, Kewen Wang, Ran Gong, Jinhan Zhang, Chihang Wang · 2026

Medical retrieval-augmented generation (RAG) systems typically operate on text chunks extracted from biomedical literature, discarding the rich visual content (tables, figures, structured layouts) of …

Read Paper →

AI & Data Science Preprint PDF DOI

Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation

Xupeng Chen, Binbin Shi, Chenqian Le, Qifu Yin, Lang Lin, Haowei Ni, Ran Gong, Panfeng Li · 2026

Deploying vision-language models (VLMs) in clinical settings demands auditable behavior under realistic failure conditions, yet the failure landscape of frontier VLMs on specialized medical inputs is …

Read Paper →

AI & Data Science Preprint PDF DOI

Linguistically Informed Multimodal Fusion for Vietnamese Scene-Text Image Captioning: Dataset, Graph Framework, and Phonological Attention

Nhi Ngoc-Yen Nguyen, Anh-Duc Nguyen, Nghia Hieu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen · 2026

Scene-text image captioning requires fusing three information streams -- visual features, OCR-detected text, and linguistic knowledge -- to generate descriptions that faithfully integrate text visible…

Read Paper →

Computer Science Preprint PDF DOI

Users' Activity Logs: the Good, the Bad, the Misconception, and the Disastrous

Eman Alashwali · 2026

Most service providers, such as Google, save logs from data generated by users while using the service. Many service providers provide users with privacy controls to manage whether, how, and for how l…

Read Paper →

Physics Preprint PDF DOI

Neuronal arithmetic operators based on Ovonic threshold switches (OTS) for biologically inspired analog computing

Jingyeong Hwang, Jaesang Lee, Jiin Bang, Younghyun Lee, Unhyeon Kang, Seungmin Oh, Kyungmin Lee, Jaehyun Park, Seongsik Park, Hyun Jae Jang, Sangbum Kim, Min Hyuk Park, Suyoun Lee · 2026

Biological neurons perform arithmetic computations - including additive integration and divisive gain modulation - through synaptic conductance changes and shunting inhibition, enabling context-depend…

Read Paper →

Biology & Life Sciences Preprint PDF DOI

Benchmarking virtual cell models for in-the-wild perturbation response

Xinjie Mao, Songming Zhang, Qianhong Wen, Xiangyu Wen, Kedu Jin, Hao Wu, Shuizhou Chen, Yuqiang Li, Lei Bai, Qi Liu, Ning Ding, Siqi Sun, Zhangyang Gao · 2026

Virtual cell (VC) models aim to predict cellular responses to any perturbations in silico and have emerged as a promising approach for drug discovery and precision medicine. Yet, a clear gap still rem…

Read Paper →

AI & Data Science Preprint PDF DOI

WaferSAGE: Large Language Model-Powered Wafer Defect Analysis via Synthetic Data Generation and Rubric-Guided Reinforcement Learning

Ke Xu · 2026

We present WaferSAGE, a framework for wafer defect visual question answering using small vision-language models. To address data scarcity in semiconductor manufacturing, we propose a three-stage synth…

Read Paper →

AI & Data Science Preprint PDF DOI

SpaAct: Spatially-Activated Transition Learning with Curriculum Adaptation for Vision-Language Navigation

Pengna Li, Kangyi Wu, Shaoqing Xu, Fang Li, Hanbing Li, Lin Zhao, Kailin Lyu, Long Chen, Zhi-Xin Yang, Nanning Zheng · 2026

Vision-and-Language Navigation (VLN) aims to enable an embodied agent to follow natural-language instructions and navigate to a target location in unseen 3D environments. We argue that adapting VLMs t…

Read Paper →

AI & Data Science Preprint PDF DOI

Math Education Digital Shadows for facilitating learning with LLMs: Math performance, anxiety and confidence in simulated students and AIs

Naomi Esposito, Anthony Tricarico, Luisa Porzio, Ali Aghazadeh Ardebili, Massimo Stella · 2026

To enhance LLMs' impact on math education, we need data on their mathematical prowess and biases across prompts. To fill this gap, we introduce MEDS (Math Education Digital Shadows) as a dataset mappi…

Read Paper →

AI & Data Science Preprint PDF DOI

Decoding Scientific Experimental Images: The SPUR Benchmark for Perception, Understanding, and Reasoning

Junpeng Ding, Zichen Tang, Haihong E, Mengyuan Ji, Yang Liu, Haolin Tian, Haiyang Sun, Pengqi Sun, Yang Xu, Yichen Liu, Haocheng Gao, Zijie Xi, Ruomeng Jiang, Peizhi Zhao, Rongjin Li, Yuanze Li, Jiacheng Liu, Zhongjun Yang, Jintong Chen, Siying Lin · 2026

We introduce SPUR, a comprehensive benchmark for scientific experimental image perception, understanding, and reasoning, comprising 4,264 question-answering (QA) pairs derived from 1,084 expert-curate…

Read Paper →

Computer Science Preprint PDF DOI

Purifying Multimodal Retrieval: Fragment-Level Evidence Selection for RAG

Xihang Wang, Zihan Wang, Chengkai Huang, Cao Liu, Ke Zeng, Quan Z. Sheng, Lina Yao · 2026

Multimodal Retrieval-Augmented Generation (MRAG) is widely adopted for Multimodal Large Language Models (MLLMs) with external evidence to reduce hallucinations. Despite its success, most existing MRAG…

Read Paper →

AI & Data Science Preprint PDF DOI

ClipTBP: Clip-Pair based Temporal Boundary Prediction with Boundary-Aware Learning for Moment Retrieval

Ji-Hyeon Kim, Ho-Joong Kim, Seong-Whan Lee · 2026

Video moment retrieval is the task of retrieving specific segments of a video corresponding to a given text query. Recent studies have been conducted to improve multimodal alignment performance throug…

Read Paper →

AI & Data Science Preprint PDF DOI

Fake3DGS: A Benchmark for 3D Manipulation Detection in Neural Rendering

Davide Di Nucci, Riccardo Catalini, Guido Borghi, Roberto Vezzani · 2026

Recent advances in 3D reconstruction and neural rendering,particularly 3D Gaussian Splatting, make it feasible and simple to edit 3D scenes and re-render them as highly realistic images. Therefore, se…

Read Paper →

Neuroscience Preprint PDF DOI

Simulating Infant First-Person Sensorimotor Experience via Motion Retargeting from Babies to Humanoids

Francisco M. Lopez, Hoshinori Kanazawa, Ondrej Fiala, Yakov Balashov, Valentin Marcel, Lukas Rustler, Miles Lenz, Dongmin Kim, Yasuo Kuniyoshi, Jochen Triesch, Matej Hoffmann · 2026

Motion retargeting from humans to human-like artificial agents is becoming increasingly important as humanoid robots grow more capable. However, most existing approaches focus only on reproducing kine…

Read Paper →

AI & Data Science Preprint PDF DOI

World2Minecraft: Occupancy-Driven Simulated Scenes Construction

Lechao Zhang, Haoran Xu, Jingyu Gong, Xuhong Wang, Yuan Xie, Xin Tan · 2026

Embodied intelligence requires high-fidelity simulation environments to support perception and decision-making, yet existing platforms often suffer from data contamination and limited flexibility. To …

Read Paper →

Computer Science Preprint PDF DOI

SandSim: Curve-Guided Gaussian Splatting for Reconstructing Sand Painting Processes

Yilin Wang, Haojie Huang, Chen Li, Yang Li, Changbo Wang, Chenhui Li · 2026

Sand painting is a process-driven art where visual appearance emerges from granular accumulation. Given a single image, reconstructing a plausible sand painting process requires modeling coherent stro…

Read Paper →

Computer Science Preprint PDF DOI

treVM: Tiny Rust Embedded Virtual Machines with WASM on Variable Resource-Constrained Hardware

Antoine Lavandier, Bastien Buil, Chrystel Gaber, Emmanuel Baccelli · 2026

Software stacks embedded on microcontroller-based hardware typically provide rudimentary APIs programmed in C/C++, basic connectivity and, sometimes, a firmware update mechanism. Such coarse mechanism…

Read Paper →

AI & Data Science Preprint PDF DOI

Online semi-supervised perception: Real-time learning without explicit feedback

Branislav Kveton, Michal Valko, Matthai Phillipose, Ling Huang · 2026

This paper proposes an algorithm for real-time learning without explicit feedback. The algorithm combines the ideas of semi-supervised learning on graphs and online learning. In particular, it iterati…

Read Paper →

Browse Research Papers

Consumer Attitudes Towards AI in Digital Health: A Mixed-Methods Survey in Australia

Connected Dependability Cage: Run-Time Function and Anomaly Monitoring for the Development and Operation of Safe Automated Vehicles

Iterative Multimodal Retrieval-Augmented Generation for Medical Question Answering

Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation

Linguistically Informed Multimodal Fusion for Vietnamese Scene-Text Image Captioning: Dataset, Graph Framework, and Phonological Attention

Users' Activity Logs: the Good, the Bad, the Misconception, and the Disastrous

Neuronal arithmetic operators based on Ovonic threshold switches (OTS) for biologically inspired analog computing

Benchmarking virtual cell models for in-the-wild perturbation response

WaferSAGE: Large Language Model-Powered Wafer Defect Analysis via Synthetic Data Generation and Rubric-Guided Reinforcement Learning

SpaAct: Spatially-Activated Transition Learning with Curriculum Adaptation for Vision-Language Navigation

Math Education Digital Shadows for facilitating learning with LLMs: Math performance, anxiety and confidence in simulated students and AIs

Decoding Scientific Experimental Images: The SPUR Benchmark for Perception, Understanding, and Reasoning

Purifying Multimodal Retrieval: Fragment-Level Evidence Selection for RAG

ClipTBP: Clip-Pair based Temporal Boundary Prediction with Boundary-Aware Learning for Moment Retrieval

Fake3DGS: A Benchmark for 3D Manipulation Detection in Neural Rendering

Simulating Infant First-Person Sensorimotor Experience via Motion Retargeting from Babies to Humanoids

World2Minecraft: Occupancy-Driven Simulated Scenes Construction

SandSim: Curve-Guided Gaussian Splatting for Reconstructing Sand Painting Processes

treVM: Tiny Rust Embedded Virtual Machines with WASM on Variable Resource-Constrained Hardware

Online semi-supervised perception: Real-time learning without explicit feedback

Browse by Category

Research Type

Publish Your Research