137,001+ open-access research outputs.
Radiology report generation (RRG) has emerged as a promising approach to alleviate radiologists' workload and reduce human errors by automatically generating diagnostic reports from medical images. A โฆ
Significance: Jacobians, or spatially resolved sensitivity profiles, are central to image reconstruction in model-based optical tomography of biological tissue. Although Monte Carlo (MC) simulations aโฆ
Automatically generating interactive 3D indoor scenes from natural language is crucial for virtual reality, gaming, and embodied AI. However, existing LLM-based approaches often suffer from spatial erโฆ
When the visual style of text is considered, a wide variety can be observed in font, color, and size. However, when a word is read, its meaning is independent of the style in which it has been writtenโฆ
While 3D Gaussian splatting (3DGS) offers explicit and efficient scene representations for cone-beam computed tomography reconstruction, conventional photometric optimization inherently suffers from sโฆ
Conventionally, Automatic Speech Recognition (ASR) systems are evaluated on their ability to correctly recognize each word contained in a speech signal. In this context, the word error rate (WER) metrโฆ
Automated plant recognition plays a crucial role in biodiversity monitoring and conservation, yet current approaches rely heavily on supervised learning, which is limited by the availability of expertโฆ
Recent generative models have shown strong performance in generating diverse 3D assets from 2D images, a fundamental research topic in computer vision and graphics. However, these models still strugglโฆ
Off-road nighttime autonomous driving suffers from unreliable visible-light perception, making infrared modality crucial for accurate freespace detection. However, progress remains limited due to the โฆ
Modeling 4D human-object interaction (HOI) is a compelling challenge in computer vision and an essential technology powering virtual and mixed-reality applications. While existing works have achieved โฆ
We introduce Skills-Coach, a novel automated framework designed to significantly enhance the self-evolution of skills within Large Language Model (LLM)-based agents. Addressing the current fragmentatiโฆ
Vision-Language-Action (VLA) models advance robotic control via strong visual-linguistic priors. However, existing VLAs predominantly frame pretraining as supervised behavior cloning, overlooking the โฆ
Volumetric videoconferencing enables immersive six Degrees of Freedom interactions by jointly transmitting visual appearance and 3D geometry. However, delivering volumetric video over today's networksโฆ
Multi-talker automatic speech recognition (ASR) in conversational recordings remains an open problem, particularly in scenarios with large portion of overlapping speech where identifying and transcribโฆ
With the advancement of multimodal large language models (MLLMs) and coding agents, the website development has shifted from manual programming to agent-based project-level code synthesis. Existing beโฆ
Vision-language models (VLMs) are increasingly used in autonomous driving because they combine visual perception with language-based reasoning, supporting more interpretable decision-making, yet theirโฆ
Visual model-based reinforcement learning (MBRL) agents can perform well on the training distribution, but often break down once the test environment shifts. In visual MBRL, recognizing that a shift hโฆ
Personalized cancer modeling for clinical applications requires robust and efficient parameter calibration, particularly in settings with limited patient data. This need is especially critical for medโฆ
Recent progress in multimodal large language models (MLLMs) has brought AI capabilities from static offline data processing to real-time streaming interaction, yet they still remain far from human-levโฆ
In recent years, Multimodal Large Language Models (MLLMs) have achieved remarkable progress on a wide range of multimodal benchmarks. Despite these advances, most existing benchmarks mainly focus on sโฆ
Free open-access publishing with Google Scholar indexing.
Submission Guide โ