Giuseppe Cocco — Research Repository

AI & Data Science Preprint PDF DOI

Learning to Reason: Targeted Knowledge Discovery and Fuzzy Logic Update for Robust Image Recognition

Gurucharan Srinivas, Joshua Niemeijer, Frank Koster · 2026

Integrating domain knowledge into deep neural networks is a promising way to improve generalization. Existing methods either encode prior knowledge in the loss function or apply post-processing module…

Read Paper →

Engineering Preprint PDF DOI

Semantics-Aware Hierarchical Token Communication: Clustering, Bit Mapping, and Power Allocation

Jihoon Lee, Seungeun Oh, Jihong Park, Seong-Lyun Kim, Seung-Woo Ko · 2026

Despite the rise of token communication (TokCom) as a new paradigm beyond traditional bit communication, existing approaches have primarily adopted artificial intelligence (AI)-centric designs that re…

Read Paper →

AI & Data Science Preprint PDF DOI

ViCrop-Det: Spatial Attention Entropy Guided Cropping for Training-Free Small-Object Detection

Hui Wang, Hongze Li, Wei Chen, Xiaojin Zhang · 2026

Transformer-based architectures have established a dominant paradigm in global semantic perception; however, they remain fundamentally constrained by the profound spatial heterogeneity inherent in nat…

Read Paper →

AI & Data Science Preprint PDF DOI

Instruction-Evidence Contrastive Dual-Stream Decoding for Grounded Vision-Language Reasoning

Yashwant Pravinrao Bangde, Debaditya Roy · 2026

Vision-Language Models (VLMs) exhibit strong performance in instruction following and open-ended vision-language reasoning, yet they frequently generate fluent outputs that are weakly grounded in visu…

Read Paper →

AI & Data Science Preprint PDF DOI

Beyond Fidelity: Semantic Similarity Assessment in Low-Level Image Processing

Runjie Wang, Weiling Chen, Tiesong Zhao, Chang Wen Chen · 2026

Low-level image processing has long been evaluated mainly from the perspective of visual fidelity. However, with the rise of deep learning and generative models, processed images may preserve perceptu…

Read Paper →

AI & Data Science Preprint PDF DOI

BMD-45: A Large-Scale CCTV Vehicle Detection Dataset for Urban Traffic in Developing Cities

Akash Sharma, Chinmay Mhatre, Sankalp Gawali, Ruthvik Bokkasam, Brij Sharma, Vishwajeet Pattanaik, Punit Rathore, Raghu Krishnapuram, Vijay Gopal Kovvali, Anirban Chakraborty, Yogesh Simmhan · 2026

Robust vehicle detection from fixed CCTV cameras is critical for Intelligent Transportation Systems. Yet existing benchmarks predominantly feature relatively homogeneous, highly organized traffic patt…

Read Paper →

Computer Science Preprint PDF DOI

Towards Localizing Conversation Partners using Head Motion

Payal Mohapatra, Calvin Murdock, Ali Aroudi, Ishwarya Ananthabhotla, Anjali Menon, Buye Xu, Morteza Khaleghimeybodi · 2026

Many individuals struggle to understand conversation partners in noisy settings, particularly amid background speakers or due to hearing impairments. Emerging wearables like smartglasses offer a trans…

Read Paper →

AI & Data Science Preprint PDF DOI

Exploring Hierarchical Consistency and Unbiased Objectness for Open-Vocabulary Object Detection

Sanghoon Lee, Geon Lee, Hyekang Park, Bumsub Ham · 2026

Conventional object detectors typically operate under a closed-set assumption, limiting recognition to a predefined set of base classes seen during training. Open-vocabulary object detection (OVD) add…

Read Paper →

AI & Data Science Preprint PDF DOI

Hard to See, Hard to Label: Generative and Symbolic Acquisition for Subtle Visual Phenomena

Renjith Prasad, Rishabh Sharma, Andrew E. Shao, Annmary Justine Koomthanam, Shreyas Kulkarni, Suparna Bhattacharya, Martin Foltin, Amit Sheth, David Orozco, Matthew Quinn, Brian Sammuli · 2026

Subtle visual anomalies such as hairline cracks, sub-millimeter voids, and low-contrast inclusions are structurally atypical yet visually ambiguous, making them both difficult to annotate and easy to …

Read Paper →

AI & Data Science Preprint PDF DOI

Federated Cross-Modal Retrieval with Missing Modalities via Semantic Routing and Adapter Personalization

Hefeng Zhou, Xuan Liu, Sicheng Chen, Wutong Zhang, Wu Yan, Jiong Lou, Chentao Wu, Guangtao Xue, Wei Zhao, Jie Li · 2026

Federated cross-modal retrieval faces severe challenges from heterogeneous client data, particularly non-IID semantic distributions and missing modalities. Under such heterogeneity, a single global mo…

Read Paper →

AI & Data Science Preprint PDF DOI

CoCo-SAM3: Harnessing Concept Conflict in Open-Vocabulary Semantic Segmentation

Yanhui Chen, Baoyao Yang, Siqi Liu, Jingchao Wang · 2026

SAM3 advances open-vocabulary semantic segmentation by introducing a prompt-driven mask generation paradigm. However, in multi-class open-vocabulary scenarios, masks generated independently from diffe…

Read Paper →

AI & Data Science Preprint PDF DOI

T-REN: Learning Text-Aligned Region Tokens Improves Dense Vision-Language Alignment and Scalability

Savya Khosla, Sethuraman T V, Aryan Chadha, Alex Schwing, Derek Hoiem · 2026

Despite recent progress, vision-language encoders struggle with two core limitations: (1) weak alignment between language and dense vision features, which hurts tasks like open-vocabulary semantic seg…

Read Paper →

Engineering Preprint PDF DOI

VIDS: A Verified Imaging Dataset Standard for Medical AI

Joan S. Muthu, John Shalen · 2026

Medical imaging AI development is fundamentally dependent on annotated datasets, yet no existing standard provides machine-enforceable validation across dataset structure, annotation provenance, quali…

Read Paper →

AI & Data Science Preprint PDF DOI

Prompt Sensitivity in Vision-Language Grounding: How Small Changes in Wording Affect Object Detection

Dawar Jyoti Deka, Amit Sethi, Syed Mohammad Ali · 2026

Vision-language models enable open-vocabulary object grounding through natural language queries, under the implicit assumption that semantically equivalent descriptions yield consistent outputs. We ex…

Read Paper →

AI & Data Science Preprint PDF DOI

Lorentz Framework for Semantic Segmentation

Zahid Hasan, Masud Ahmed, Nirmalya Roy · 2026

Semantic segmentation in hyperbolic space enables compact modeling of hierarchical structure while providing inherent uncertainty quantification. Prior approaches predominantly rely on the Poincar\'e …

Read Paper →

AI & Data Science Preprint PDF DOI

Beyond Feature Fusion: Contextual Bayesian PEFT for Multimodal Uncertainty Estimation

Habibeh Naderi, Behrouz Haji Soleimani, Stan Matwin · 2026

We introduce CoCo-LoRA, a multimodal, uncertainty-aware parameter-efficient fine-tuning method for text prediction tasks accompanied by audio context. Existing PEFT approaches such as LoRA are efficie…

Read Paper →

AI & Data Science Preprint PDF DOI

Weak-to-Strong Knowledge Distillation Accelerates Visual Learning

Baiang Li, Wenhao Chai, Felix Heide · 2026

Large-scale visual learning is increasingly limited by training cost. Existing knowledge distillation methods transfer from a stronger teacher to a weaker student for compression or final-accuracy imp…

Read Paper →

AI & Data Science Preprint PDF DOI

DETR-ViP: Detection Transformer with Robust Discriminative Visual Prompts

Bo Qian, Dahu Shi, Xing Wei · 2026

Visual prompted object detection enables interactive and flexible definition of target categories, thereby facilitating open-vocabulary detection. Since visual prompts are derived directly from image …

Read Paper →

AI & Data Science Preprint PDF DOI

Prompt-Guided Image Editing with Masked Logit Nudging in Visual Autoregressive Models

Amir El-Ghoussani, Marc Holle, Gustavo Carneiro, Vasileios Belagiannis · 2026

We address the problem of prompt-guided image editing in visual autoregressive models. Given a source image and a target text prompt, we aim to modify the source image according to the target prompt, …

Read Paper →

AI & Data Science Preprint PDF DOI

ASTRA: Enhancing Multi-Subject Generation with Retrieval-Augmented Pose Guidance and Disentangled Position Embedding

Tianze Xia, Zijian Ning, Zonglin Zhao, Mingjia Wang · 2026

Subject-driven image generation has shown great success in creating personalized content, but its capabilities are largely confined to single subjects in common poses. Current approaches face a fundam…

Read Paper →

Browse Research Papers

Learning to Reason: Targeted Knowledge Discovery and Fuzzy Logic Update for Robust Image Recognition

Semantics-Aware Hierarchical Token Communication: Clustering, Bit Mapping, and Power Allocation

ViCrop-Det: Spatial Attention Entropy Guided Cropping for Training-Free Small-Object Detection

Instruction-Evidence Contrastive Dual-Stream Decoding for Grounded Vision-Language Reasoning

Beyond Fidelity: Semantic Similarity Assessment in Low-Level Image Processing

BMD-45: A Large-Scale CCTV Vehicle Detection Dataset for Urban Traffic in Developing Cities

Towards Localizing Conversation Partners using Head Motion

Exploring Hierarchical Consistency and Unbiased Objectness for Open-Vocabulary Object Detection

Hard to See, Hard to Label: Generative and Symbolic Acquisition for Subtle Visual Phenomena

Federated Cross-Modal Retrieval with Missing Modalities via Semantic Routing and Adapter Personalization

CoCo-SAM3: Harnessing Concept Conflict in Open-Vocabulary Semantic Segmentation

T-REN: Learning Text-Aligned Region Tokens Improves Dense Vision-Language Alignment and Scalability

VIDS: A Verified Imaging Dataset Standard for Medical AI

Prompt Sensitivity in Vision-Language Grounding: How Small Changes in Wording Affect Object Detection

Lorentz Framework for Semantic Segmentation

Beyond Feature Fusion: Contextual Bayesian PEFT for Multimodal Uncertainty Estimation

Weak-to-Strong Knowledge Distillation Accelerates Visual Learning

DETR-ViP: Detection Transformer with Robust Discriminative Visual Prompts

Prompt-Guided Image Editing with Masked Logit Nudging in Visual Autoregressive Models

ASTRA: Enhancing Multi-Subject Generation with Retrieval-Augmented Pose Guidance and Disentangled Position Embedding

Browse by Category

Research Type

Publish Your Research