137,001+ open-access research outputs.
Traditional Shot Boundary Detection (SBD) inherently struggles with complex transitions by formulating the task around isolated cut points, frequently yielding corrupted video shots. We address this fโฆ
Despite the rapid progress of large vision-language models (LVLMs), fine-grained, state-conditioned GUI interaction remains challenging. Current evaluations offer limited coverage, imprecise target-stโฆ
Since the dawn of Trading Card Games, the genre has grown into a multi-billion-dollar industry engaging millions of analog and digital players worldwide. Popular TCGs rely on regular updates, balance โฆ
Multimodal large language models (MLLMs) are increasingly used to translate visual artifacts into code, from UI mockups into HTML to scientific plots into Python scripts. A circuit diagram can be viewโฆ
The pervasive growth of digital content, specifically short videos on social media platforms, has significantly altered how topics are discussed and understood in public discourse. In this work, we adโฆ
Due to the scarcity of large-scale in-the-wild triplet data and the improper use of masks, the performance of video virtual try-on models remains limited. In this paper, we first introduce **TripVVT-1โฆ
We present a museum installation in a 180{\deg} dome theater, which gives the museum visitor the experience of conducting a symphony orchestra. We have pre-recorded a short music piece performed by a โฆ
As Vision-Language Models (VLMs) become increasingly integrated into decision-making systems, it is essential to understand how visual inputs influence their behavior. This paper investigates the effeโฆ
Tunnel inspection requires outputs that can support defect localization, measurement, severity grading, and engineering documentation. Existing training-free foundation-model pipelines usually stop atโฆ
Context. The Large Array Survey Telescope (LAST) is a wide-field visual-band survey designed to explore the variable and transient sky with high cadence. Its raw data stream is automatically processedโฆ
As one of the mainstream models of artificial intelligence, world models allow agents to learn the representation of the environment for efficient prediction and planning. However, classical world modโฆ
We present D-Rex, a person-specific framework for photorealistic, relightable, expressive, and animatable full-body human avatars with free-viewpoint rendering. Existing methods for relightable full-bโฆ
We introduce LRS-VoxMM, an in-the-wild benchmark for audio-visual speech recognition (AVSR). The benchmark is derived from VoxMM, a dataset of diverse real-world spoken conversations with human-annotaโฆ
NetSatBench is a distributed emulation platform for evaluating communication protocols and application workloads over large-scale LEO satellite systems. Satellites, gateways, and user terminals are imโฆ
Using the SIMBA, EAGLE, and IllustrisTNG-100 galaxy formation simulations, we examine the anisotropy of the satellite distribution and its dependencies on central galaxies, host halos, and cosmic filaโฆ
AI integration in automotive perception systems shifts requirements from static specifications to continuously evolving entities shaped by data, models, and operating contexts. When such changes are nโฆ
Computational molecular representations underpin virtual screening, property prediction, and materials discovery. Conventional fingerprints are efficient and deterministic but lose structural informatโฆ
Vision-Language-Action (VLA) models achieve strong semantic generalization but often lack fine-grained modeling of world dynamics. Recent work explores video generation models as a foundation for worlโฆ
Photoacoustic imaging is the leading technique for deep tissue optical imaging, allowing single-shot imaging at depths. However, its resolution may be limited by acoustic aberrations, caused by naturaโฆ
This article outlines a new framework of traffic light optimization through a digital twin of the transport infrastructure, managed by agentic AI to ensure real-time autonomous decisions. The frameworโฆ
Free open-access publishing with Google Scholar indexing.
Submission Guide โ