Databases — Research Repository

Computer Science Preprint PDF DOI

Adaptive and AI-Augmented Security Testing: A Systematic Survey of Program Analysis, Feedback-Driven Testing, and Hybrid Learning-Based Approaches

Michael Wienczkowski · 2026

Modern software systems are increasingly developed within rapid continuous integration and deployment (CI/CD) pipelines, where ensuring security prior to release presents significant technical and org…

Read Paper →

Computer Science Preprint PDF DOI

CacheRAG: A Semantic Caching System for Retrieval-Augmented Generation in Knowledge Graph Question Answering

Yushi Sun, Lei Chen · 2026

The integration of Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) has significantly advanced Knowledge Graph Question Answering (KGQA). However, existing LLM-driven KGQA system…

Read Paper →

Computer Science Preprint PDF DOI

Budget-Constrained Online Retrieval-Augmented Generation: The Chunk-as-a-Service Model

Shawqi Al-Maliki, Ammar Gharaibeh, Mohamed Rahouti, Mohammad Ruhul Amin, Mohamed Abdallah, Junaid Qadir, Ala Al-Fuqaha · 2026

Large Language Models (LLMs) have revolutionized the field of natural language processing. However, they exhibit some limitations, including a lack of reliability and transparency: they may hallucinat…

Read Paper →

Computer Science Preprint PDF DOI

Health System Scale Semantic Search Across Unstructured Clinical Notes

Faith Wavinya Mutinda, Spandana Makeneni, Anna Lin, Shivaji Dutta, Irit R. Rasooly, Patrick Dibussolo, Shivani Kamath Belman, Hessam Shahriari, Kevin Murphy, Alex B. Ruan, Barbara H. Chaiyachati, Sanjay Chainani, Robert W. Grundmeier, Scott M. Haag, Jeffrey M. Miller, Heather M. Griffis, Ian M. Campbell · 2026

Introduction: Semantic search, which retrieves documents based on conceptual similarity rather than keyword matching, offers substantial advantages for retrieval of clinical information. However, depl…

Read Paper →

AI & Data Science Preprint PDF DOI

DualGeo: A Dual-View Framework for Worldwide Image Geo-localization

Junchao Cui, Wenqi Shi, Shaoyong Du, Hang He, Xuanzi Ma, Hao Tang, Xiangyang Luo · 2026

Worldwide image geo-localization aims to infer the geographic location of an image captured anywhere on Earth, spanning street, city, regional, national, and continental scales. Existing methods rely …

Read Paper →

Physics Preprint PDF DOI

Enabling real-time multi-messenger follow-up of transient events with Astro-COLIBRI

Bernardo Cornejo Avila, Sofia Bisero, Mickael Costa, Antoine Ciric, Ilja Jaroschewski, Weizmann Kiendrebeogo, Fabian Schussler · 2026

Time-domain astrophysics is a rapidly growing field focused on the study of transient phenomena such as Gamma-Ray Bursts (GRBs), Fast Radio Bursts (FRBs), supernovae, novae, and AGN flares. Their char…

Read Paper →

Computer Science Preprint PDF DOI

GeoSearch: Augmenting Worldwide Geolocalization with Web-Scale Reverse Image Search and Image Matching

Tung-Duong Le-Duc, Hoang-Quoc Nguyen-Son, Minh-Son Dao · 2026

Worldwide image geolocalization, which aims to predict the GPS coordinates of any image on Earth, remains challenging due to global visual diversity. Recent generative approaches based on Retrieval-Au…

Read Paper →

AI & Data Science Preprint PDF DOI

Wiki Dumps to Training Corpora: South Slavic Case

Mihailo Skoric · 2026

This paper presents a methodology for transforming raw Wikimedia dumps into quality textual corpora for seven South Slavic languages. The work is divided into two major phases. The first involves extr…

Read Paper →

AI & Data Science Preprint PDF DOI

The Structured Output Benchmark: A Multi-Source Benchmark for Evaluating Structured Output Quality in Large Language Models

Abhinav Kumar Singh, Harsha Vardhan Khurdula, Yoeven D Khemlani, Vineet Agarwal · 2026

Large Language Models are increasingly being deployed to extract structured data from unstructured and semi-structured sources: parsing invoices, medical records, and converting PDF documents to datab…

Read Paper →

Computer Science Preprint PDF DOI

Fixed-parameter tractable inference for discrete probabilistic programs, via string diagram algebraisation

Benedikt Peterseim, Milan Lopuhaa-Zwakenberg · 2026

Discrete probabilistic programs (DPPs) provide a highly expressive formalism for compactly defining arbitrary finite probabilistic models. This expressivity comes at a price: DPP inference is PSPACE-h…

Read Paper →

Computer Science Preprint PDF DOI

VisualNeo: Bridging the Gap between Visual Query Interfaces and Graph Query Engines

Kai Huang, Houdong Liang, Chongchong Yao, Xi Zhao, Yue Cui, Yao Tian, Ruiyuan Zhang, Xiaofang Zhou · 2026

Visual Graph Query Interfaces (VQIs) empower non-programmers to query graph data by constructing visual queries intuitively. Devising efficient technologies in Graph Query Engines (GQEs) for interacti…

Read Paper →

Mathematics Preprint PDF DOI

A Combinatorial Optimisation Approach to Multi-factorial Gap-filling in Genome-scale Metabolic Models (GEMs)

Philip Kilby, Sevvandi Kandanaarachchi, Matthew J. Morgan, Amy M. Paten, Mariana Velasque, Andrew C. Warden, Juan P. Molina Ortiz · 2026

Genome-Scale Metabolic Models (GEMs) describe the interactions between genes, proteins, and the biochemical reactions that underpin an organism's metabolism aiming to computationally simulate function…

Read Paper →

AI & Data Science Preprint PDF DOI

Semantic Layers for Reliable LLM-Powered Data Analytics: A Paired Benchmark of Accuracy and Hallucination Across Three Frontier Models

Michael Rumiantsau, Ivan Fokeev · 2026

LLMs deployed for natural-language querying of analytical databases suffer from two intertwined failures - incorrect answers and confident hallucinations - both rooted in the same cause: the model is …

Read Paper →

Computer Science Preprint PDF DOI

Scalable Secure Biometric Authentication without Auxiliary Identifiers

Alexander Bienstock, Daniel Escudero, Antigoni Polychroniadou, Zhen Zeng, Pranav Bhat, Ashok Singal, Prashant Sharma, Manuela Veloso · 2026

The prevalence of biometric authentication has been on the rise due to its ease of use and elimination of weak passwords. To date, most biometric authentication systems have been designed for on-devic…

Read Paper →

AI & Data Science Preprint PDF DOI

CiteRadar: A Citation Intelligence Platform for Researcher Profiling and Geographic Visualization

Chenxu Niu, Yiming Sun · 2026

Understanding the geographic reach and community structure of one's scholarly citations is increasingly valuable for career development, grant applications, and collaboration discovery -- yet accessib…

Read Paper →

AI & Data Science Preprint PDF DOI

Subjective Portrait Region Cropping in Landscape Videos with Temporal Annotation Smoothing

Cheng-Han Lee, Maniratnam Mandal, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik · 2026

With the rise of mobile video consumption on diverse handheld display resolutions and orientation modes, altering videos to aspect ratios poses challenges. Static cropping and border padding often com…

Read Paper →

Computer Science Preprint PDF DOI

FGDM: Reasoning Aware Multi-Agentic Framework for Software Bug Detection using Chain of Thought and Tree of Thought Prompting

Srita Padmanabhuni, Bhargavi Karuturi, Jerusha Karen Indupalli, Santhan Reddy Chilla, Vivek Yelleti · 2026

Deep Learning methods are becoming prominent in automated software bug detection; however, they lack the global understanding of the given code. Consequently, their performance tends to degrade, espec…

Read Paper →

Computer Science Preprint PDF DOI

BoomHQ: Learning to Boost Multiple Hybrid Queries on Vector DBMSs

Ermu Qiu, Tianyi Chen, Jun Gao, Xing Wei, Yaofeng Tu, Yinjun Han, Yang Lin · 2026

Hybrid queries, which combine vector nearest neighbor searches with scalar predicates, represent a fundamental challenge in managing vector databases. Existing methods often restrict the number of vec…

Read Paper →

Computer Science Preprint PDF DOI

Why AI Harms Can't Be Fixed One Identity at a Time: What 5300 Incident Reports Reveal About Intersectionality

Edyta Bogucka, Sanja Scepanovic, Daniele Quercia · 2026

AI risk assessment is the primary tool for identifying harms caused by AI systems. These include intersectional harms, which arise from the interaction between identity categories (e.g., class and ski…

Read Paper →

Computer Science Preprint PDF DOI

Geometric Analysis of Self-Supervised Vision Representations for Semantic Image Retrieval

Esteban Rodriguez-Betancourt, Edgar Casasola-Murillo · 2026

Content-based image retrieval (CBIR) systems enable users to search images based on visual content instead of relying on metadata. The text domain has benefited from vector search of representations c…

Read Paper →

Browse Research Papers

Adaptive and AI-Augmented Security Testing: A Systematic Survey of Program Analysis, Feedback-Driven Testing, and Hybrid Learning-Based Approaches

CacheRAG: A Semantic Caching System for Retrieval-Augmented Generation in Knowledge Graph Question Answering

Budget-Constrained Online Retrieval-Augmented Generation: The Chunk-as-a-Service Model

Health System Scale Semantic Search Across Unstructured Clinical Notes

DualGeo: A Dual-View Framework for Worldwide Image Geo-localization

Enabling real-time multi-messenger follow-up of transient events with Astro-COLIBRI

GeoSearch: Augmenting Worldwide Geolocalization with Web-Scale Reverse Image Search and Image Matching

Wiki Dumps to Training Corpora: South Slavic Case

The Structured Output Benchmark: A Multi-Source Benchmark for Evaluating Structured Output Quality in Large Language Models

Fixed-parameter tractable inference for discrete probabilistic programs, via string diagram algebraisation

VisualNeo: Bridging the Gap between Visual Query Interfaces and Graph Query Engines

A Combinatorial Optimisation Approach to Multi-factorial Gap-filling in Genome-scale Metabolic Models (GEMs)

Semantic Layers for Reliable LLM-Powered Data Analytics: A Paired Benchmark of Accuracy and Hallucination Across Three Frontier Models

Scalable Secure Biometric Authentication without Auxiliary Identifiers

CiteRadar: A Citation Intelligence Platform for Researcher Profiling and Geographic Visualization

Subjective Portrait Region Cropping in Landscape Videos with Temporal Annotation Smoothing

FGDM: Reasoning Aware Multi-Agentic Framework for Software Bug Detection using Chain of Thought and Tree of Thought Prompting

BoomHQ: Learning to Boost Multiple Hybrid Queries on Vector DBMSs

Why AI Harms Can't Be Fixed One Identity at a Time: What 5300 Incident Reports Reveal About Intersectionality

Geometric Analysis of Self-Supervised Vision Representations for Semantic Image Retrieval

Browse by Category

Research Type

Publish Your Research