17,900+ open-access research outputs.
Vision-Language-Action (VLA) models have increasingly incorporated reasoning mechanisms for complex robotic manipulation. However, existing approaches share a critical limitation: whether employing exโฆ
We introduce LRS-VoxMM, an in-the-wild benchmark for audio-visual speech recognition (AVSR). The benchmark is derived from VoxMM, a dataset of diverse real-world spoken conversations with human-annotaโฆ
The advancement of automated vehicles introduces complex safety challenges, particularly in dynamic and unpredictable environments where AI-enabled perception systems must operate reliably. Ensuring cโฆ
A critical bottleneck hindering further advancement in embodied AI and robotics is the challenge of scaling robot data. To address this, the field of learning robot manipulation skills from human videโฆ
Understanding human actions is critical for advancing behavior analysis in human-robot interaction. Particularly in tasks that demand quick and proactive feedback, robots must recognize human actions โฆ
The emergence of movable antenna (MA) technology provides a promising way to enhance wireless sensing and communication by introducing spatial degrees of freedom through dynamic array reconfiguration.โฆ
Multi-band sensing has emerged as a key enabler of integrated sensing and communication (ISAC), one of the six primary usage scenarios defined for IMT-2030 (6G). The introduction of frequency range 3 โฆ
Accurate segmentation and localization of left atrial (LA) ablation scars from Late gadolinium enhancement (LGE)-MRI is essential for assessing the lesion completeness and guiding ablation therapy. Inโฆ
Power electronic converters are fundamental building blocks of both AC and DC microgrids, enabling the integration of renewable energy sources, energy storage systems, electronic loads, and electric vโฆ
Deploying a neuro-symbolic task planner on a new domain today requires significant manual effort: a domain expert must author relaxation and complementary rules, and hundreds of training problems mustโฆ
Embodied AI and robotic systems increasingly depend on scalable, diverse, and physically grounded 3D content for simulation-based training and real-world deployment. While 3D generative modeling has aโฆ
The CONCERTO millimeter-wave spectral-imaging instrument was deployed on the Atacama Pathfinder EXperiment (APEX), where it acquired science data between April 2021 and May 2023. The instrument featurโฆ
The integration of multimodal sensing and millimeter-wave (mmWave) communications is a key enabler for highly mobile vehicle-to-infrastructure (V2I) networks. However, continuous high-resolution visuaโฆ
Cross-lingual speaker verification suffers from severe language-speaker entanglement. This causes systematic degradation in the hardest scenario: correctly accepting utterances from the same speaker aโฆ
Conventional neural speech codecs suffer from severe intelligibility degradation at ultra-low bitrates, where the bottleneck transitions from acoustic distortion to semantic loss. To address this issuโฆ
Reliable backup localization for unmanned aerial vehicles (UAVs) operating in GNSS-denied nighttime conditions remains an open challenge due to the severe modality gap between daytime RGB maps and nigโฆ
Preserving a speaker's voice identity while generating speech in a different language remains a fundamental challenge in spoken language technology, particularly in specialized domains such as scientiโฆ
Generative motion prediction must satisfy three simultaneous requirements for real-world autonomy: high accuracy, diverse multimodal futures, and strictly bounded latency. Diffusion models meet the fiโฆ
Robotic systems that interact with the physical world must reason about kinematic and dynamic constraints imposed by their own embodiment, their environment, and the task at hand. We introduce KinDER,โฆ
Federated inference enhances LLM performance in edge computing through weighted averaging of distributed model predictions. However, autoregressive LLM inference requires frequent full-model forward pโฆ
Free open-access publishing with Google Scholar indexing.
Submission Guide โ