Scott Anthony Sisson — Research Repository

Computer Science Preprint PDF DOI

Essential, Yet Overlooked: Identity Verification Barriers for Blind and Low Vision People in Government Services

Ryan John Oommen, Tanusree Sharma · 2026

Identity verification is a critical gateway to accessing government services and public benefits, yet contemporary systems are typically designed around visual interaction, leaving blind and low visio…

Read Paper →

AI & Data Science Preprint PDF DOI

Beyond Gaussian Bottlenecks: Topologically Aligned Encoding of Vision-Transformer Feature Spaces

Andrew Bond, Ilkin Umut Melanlioglu, Erkut Erdem, Aykut Erdem · 2026

Modern visual world modeling systems increasingly rely on high-capacity architectures and large-scale data to produce plausible motion, yet they often fail to preserve underlying 3D geometry or physic…

Read Paper →

Mathematics Preprint PDF DOI

Gauge symmetry and uniqueness in inverse problems for the JMGT equation

Dong Qiu, Xiang Xu, Yeqiong Ye, Ting Zhou · 2026

In this paper, we study an inverse boundary value problem for the Jordan--Moore--Gibson--Thompson equation on a simple Riemannian manifold. We consider an all boundary measurement map that maps Dirich…

Read Paper →

AI & Data Science Preprint PDF DOI

TransVLM: A Vision-Language Framework and Benchmark for Detecting Any Shot Transitions

Ce Chen, Yi Ren, Yuanming Li, Viktor Goriachko, Zhenhui Ye, Zujin Guo, Zhibin Hong, Mingming Gong · 2026

Traditional Shot Boundary Detection (SBD) inherently struggles with complex transitions by formulating the task around isolated cut points, frequently yielding corrupted video shots. We address this f…

Read Paper →

Computer Science Preprint PDF DOI

From Mirage to Grounding: Towards Reliable Multimodal Circuit-to-Verilog Code Generation

Guang Yang, Xing Hu, Xiang Chen, Xin Xi · 2026

Multimodal large language models (MLLMs) are increasingly used to translate visual artifacts into code, from UI mockups into HTML to scientific plots into Python scripts. A circuit diagram can be view…

Read Paper →

AI & Data Science Preprint PDF DOI

The Effects of Visual Priming on Cooperative Behavior in Vision-Language Models

Kenneth J. K. Ong · 2026

As Vision-Language Models (VLMs) become increasingly integrated into decision-making systems, it is essential to understand how visual inputs influence their behavior. This paper investigates the effe…

Read Paper →

AI & Data Science Preprint PDF DOI

Dynamic Cluster Data Sampling for Efficient and Long-Tail-Aware Vision-Language Pre-training

Mingliang Liang, Zhuoran Liu, Arjen P. de Vries, Martha Larson · 2026

The computational cost of training a vision-language model (VLM) can be reduced by sampling the training data. Previous work on efficient VLM pre-training has pointed to the importance of semantic dat…

Read Paper →

AI & Data Science Preprint PDF DOI

Focus Session: Autonomous Systems Dependability in the era of AI: Design Challenges in Safety, Security, Reliability and Certification

Behnaz Ranjbar, Kirankumar Raveendiran, Sudeep Pasricha, Samarjit Chakraborty, Cecilia Carbonelli, Akash Kumar · 2026

The design of embedded safety-critical systems such as those used in next-generation automotive and autonomous platforms, is increasingly challenged by escalating system complexity, hardware-software …

Read Paper →

Physics Preprint PDF DOI

Macroscopic photon counting beating the Poisson noise limit

Timon Schapeler, Fabian Schlue, Isabell Mischke, Michael Stefszky, Benjamin Brecht, Christine Silberhorn, Tim J. Bartley · 2026

Photon counting is a cornerstone of quantum optics. Here, we demonstrate precisely counting from 0 to over 9000 photons, beating the Poisson noise limit by at least $4.1~\mathrm{dB}$ across this range…

Read Paper →

AI & Data Science Preprint PDF DOI

Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation

Xupeng Chen, Binbin Shi, Chenqian Le, Qifu Yin, Lang Lin, Haowei Ni, Ran Gong, Panfeng Li · 2026

Deploying vision-language models (VLMs) in clinical settings demands auditable behavior under realistic failure conditions, yet the failure landscape of frontier VLMs on specialized medical inputs is …

Read Paper →

AI & Data Science Preprint PDF DOI

Improving Calibration in Test-Time Prompt Tuning for Vision-Language Models via Data-Free Flatness-Aware Prompt Pretraining

Hyeonseo Jang, Jaebyeong Jeon, Joong-Won Hwang, Kibok Lee · 2026

Test-time prompt tuning (TPT) has emerged as a promising technique for enhancing the adaptability of vision-language models by optimizing textual prompts using unlabeled test data. However, prior stud…

Read Paper →

AI & Data Science Preprint PDF DOI

SpaAct: Spatially-Activated Transition Learning with Curriculum Adaptation for Vision-Language Navigation

Pengna Li, Kangyi Wu, Shaoqing Xu, Fang Li, Hanbing Li, Lin Zhao, Kailin Lyu, Long Chen, Zhi-Xin Yang, Nanning Zheng · 2026

Vision-and-Language Navigation (VLN) aims to enable an embodied agent to follow natural-language instructions and navigate to a target location in unseen 3D environments. We argue that adapting VLMs t…

Read Paper →

Mathematics Preprint PDF DOI

Discontinuous Galerkin IMEX Pressure Correction Scheme for the Poisson-Nernst-Planck-Navier-Stokes Equations

Bikram Bir, Amiya K. Pani · 2026

Based on a discontinuous Galerkin method in the spatial directions and an improved implicit-explicit pressure-correction scheme in the temporal direction, this paper discusses a fully discrete scheme …

Read Paper →

AI & Data Science Preprint PDF DOI

EdgeFM: Efficient Edge Inference for Vision-Language Models

Mengling Deng, Yuanpeng Chen, Sheng Yang, Wei Tao, Wenhai Zhang, Hui Song, Linyuanhao Qin, Kai Zhao, Xiaojun Ye, Shanhui Mo, Jingli Fan, Shuang Zhang, Bei Liu, Tiankun Zhao, Xiangjing An · 2026

Vision-language models (VLMs) have demonstrated strong applicability in edge industrial applications, yet their deployment remains severely constrained by requirements for deterministic low latency an…

Read Paper →

AI & Data Science Preprint PDF DOI

Understanding Adversarial Transferability in Vision-Language Models for Autonomous Driving: A Cross-Architecture Analysis

David Fernandez, Pedram MohajerAnsari, Amir Salarpour, Mert D. Pese · 2026

Vision-language models (VLMs) are increasingly used in autonomous driving because they combine visual perception with language-based reasoning, supporting more interpretable decision-making, yet their…

Read Paper →

AI & Data Science Preprint PDF DOI

Judge, Then Drive: A Critic-Centric Vision Language Action Framework for Autonomous Driving

Lijin Yang, Jianing Huang, Zhongzhan Huang, Shu Liu, Hao Yang · 2026

Recent advances in vision language action (VLA) models have shown remarkable potential for autonomous driving by directly mapping multimodal inputs to control signals. However, previous VLA-based meth…

Read Paper →

Physics Preprint PDF DOI

Hindered Prompt-Neutron Evaporation in Surrogate Reactions for $^{239}$Pu(n,f)

D. Ramos, M. Caamano, F. Farget, C. Rodriguez-Tajes, A. Lemasson, M. Rejmund, C. Schmitt, E. Clement, O. Litaize, O. Serot, L. Audouin, J. Benlliure, E. Casarejos, D. Cortina, D. Dore, B. Fernandez-Dominguez, G. de France, A. Heinz, B. Jacquot, C. Paradela, T. Roger · 2026

Isotopic fission-fragment distributions of $^{240}$Pu have been measured, for the first time, as a function of the initial excitation energy, and the prompt neutron multiplicity has been derived from …

Read Paper →

AI & Data Science Preprint PDF DOI

Three-Step Nav: A Hierarchical Global-Local Planner for Zero-Shot Vision-and-Language Navigation

Wanrong Zheng, Yunhao Ge, Laurent Itti · 2026

Breakthrough progress in vision-based navigation through unknown environments has been achieved by using multimodal large language models (MLLMs). These models can plan a sequence of motions by evalua…

Read Paper →

AI & Data Science Preprint PDF DOI

TAP into the Patch Tokens: Leveraging Vision Foundation Model Features for AI-Generated Image Detection

Ahmed Abdullah, Nikolas Ebert, Oliver Wasenmuller · 2026

Recent methods demonstrate that large-scale pretrained models, such as CLIP vision transformers, effectively detect AI-generated images (AIGIs) from unseen generative models when used as feature extra…

Read Paper →

Computer Science Preprint PDF DOI

Distributed Multi-View Vision-Only RSSI Estimation

Jung-Beom Kim, Woongsup Lee · 2026

Received Signal Strength Indicator (RSSI) estimation is essential for wireless link management, yet conventional feedback-based approaches incur uplink overhead, suffer from measurement instability, a…

Read Paper →

Browse Research Papers

Essential, Yet Overlooked: Identity Verification Barriers for Blind and Low Vision People in Government Services

Beyond Gaussian Bottlenecks: Topologically Aligned Encoding of Vision-Transformer Feature Spaces

Gauge symmetry and uniqueness in inverse problems for the JMGT equation

TransVLM: A Vision-Language Framework and Benchmark for Detecting Any Shot Transitions

From Mirage to Grounding: Towards Reliable Multimodal Circuit-to-Verilog Code Generation

The Effects of Visual Priming on Cooperative Behavior in Vision-Language Models

Dynamic Cluster Data Sampling for Efficient and Long-Tail-Aware Vision-Language Pre-training

Focus Session: Autonomous Systems Dependability in the era of AI: Design Challenges in Safety, Security, Reliability and Certification

Macroscopic photon counting beating the Poisson noise limit

Auditing Frontier Vision-Language Models for Trustworthy Medical VQA: Grounding Failures, Format Collapse, and Domain Adaptation

Improving Calibration in Test-Time Prompt Tuning for Vision-Language Models via Data-Free Flatness-Aware Prompt Pretraining

SpaAct: Spatially-Activated Transition Learning with Curriculum Adaptation for Vision-Language Navigation

Discontinuous Galerkin IMEX Pressure Correction Scheme for the Poisson-Nernst-Planck-Navier-Stokes Equations

EdgeFM: Efficient Edge Inference for Vision-Language Models

Understanding Adversarial Transferability in Vision-Language Models for Autonomous Driving: A Cross-Architecture Analysis

Judge, Then Drive: A Critic-Centric Vision Language Action Framework for Autonomous Driving

Hindered Prompt-Neutron Evaporation in Surrogate Reactions for $^{239}$Pu(n,f)

Three-Step Nav: A Hierarchical Global-Local Planner for Zero-Shot Vision-and-Language Navigation

TAP into the Patch Tokens: Leveraging Vision Foundation Model Features for AI-Generated Image Detection

Distributed Multi-View Vision-Only RSSI Estimation

Browse by Category

Research Type

Publish Your Research