Maico Leberle — Research Repository

Computer Science Preprint PDF DOI

NuggetIndex: Governed Atomic Retrieval for Maintainable RAG

Saber Zerhoudi, Michael Granitzer, Jelena Mitrovic · 2026

Retrieval-augmented generation (RAG) systems are frequently evaluated via fact-based metrics, yet standard implementations retrieve passages or static propositions. This unit mismatch between evaluati…

Read Paper →

AI & Data Science Preprint PDF DOI

MAIC-UI: Making Interactive Courseware with Generative UI

Shangqing Tu, Yanjia Li, Keyu Chen, Sichen Zhang, Jifan Yu, Daniel Zhang-Li, Lei Hou, Juanzi Li, Yu Zhang, Huiqin Liu · 2026

Creating interactive STEM courseware traditionally requires HTML/CSS/JavaScript expertise, leaving barriers for educators. While generative AI can produce HTML codes, existing tools generate static pr…

Read Paper →

AI & Data Science Preprint PDF DOI

Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling

Fan Jiang, Yu Zhao, Chenyang Lyu, Tianqi Shi, Yichao Du, Feihu Jiang, Longyue Wang, Weihua Luo · 2026

We present Marco-MoE, a suite of fully open multilingual sparse Mixture-of-Experts (MoE) models. Marco-MoE features a highly sparse design in which only around 5\% of the total parameters are activate…

Read Paper →

Computer Science Preprint PDF DOI

Lost in Decoding? Reproducing and Stress-Testing the Look-Ahead Prior in Generative Retrieval

Kidist Amde Mekonnen, Yongkang Li, Yubao Tang, Simon Lupart, Maarten de Rijke · 2026

Generative retrieval (GR) ranks documents by autoregressively generating document identifiers. Because many GR methods rely on trie-constrained beam search, they are vulnerable to early pruning of rel…

Read Paper →

Computer Science Preprint PDF DOI

A Parametric Memory Head for Continual Generative Retrieval

Kidist Amde Mekonnen, Yubao Tang, Maarten de Rijke · 2026

Generative information retrieval (GenIR) consolidates retrieval into a single neural model that decodes document identifiers (docids) directly from queries. While this model-as-index paradigm offers a…

Read Paper →

Computer Science Preprint PDF DOI

Efficient Rationale-based Retrieval: On-policy Distillation from Generative Rerankers based on JEPA

Teng Chen, Sheng Xu, Feixiang Guo, Xiaoyu Wang, Qingqing Gu, Hongyan Li, Luo Ji · 2026

Unlike traditional fact-based retrieval, rationale-based retrieval typically necessitates cross-encoding of query-document pairs using large language models, incurring substantial computational costs.…

Read Paper →

AI & Data Science Preprint PDF DOI

Benchmarking Vision Foundation Models for Domain-Generalizable Face Anti-Spoofing

Mika Feng, Pierre Gallin-Martel, Koichi Ito, Takafumi Aoki · 2026

Face Anti-Spoofing (FAS) remains challenging due to the requirement for robust domain generalization across unseen environments. While recent trends leverage Vision-Language Models (VLMs) for semantic…

Read Paper →

AI & Data Science Preprint PDF DOI

MARCO: Navigating the Unseen Space of Semantic Correspondence

Claudia Cuttano, Gabriele Trivigno, Carlo Masone, Stefan Roth · 2026

Recent advances in semantic correspondence rely on dual-encoder architectures, combining DINOv2 with diffusion backbones. While accurate, these billion-parameter models generalize poorly beyond traini…

Read Paper →

Physics Preprint PDF DOI

Enhancing Ly{\alpha} Emitter Identification in HETDEX with a Convolutional Neural Network

Shiro Mukae, Erin Mentuch Cooper, Karl Gebhardt, Dustin Davis, Lindsay R. House, Mahdi Qezlou, Julian B. Munoz, Shun Saito, Daniel J. Farrow, Caryl Gronwall, Donald P. Schneider, Eric Gawiser · 2026

We present a deep learning framework to enhance the identification of Ly$\alpha$ emitters (LAEs) in the Hobby-Eberly Telescope Dark Energy Experiment (HETDEX), an untargeted spectroscopic survey of LA…

Read Paper →

AI & Data Science Preprint PDF DOI

ODUTQA-MDC: A Task for Open-Domain Underspecified Tabular QA with Multi-turn Dialogue-based Clarification

Zhensheng Wang, ZhanTeng Lin, Wenmian Yang, Kun Zhou, Yiquan Zhang, Weijia Jia · 2026

The advancement of large language models (LLMs) has enhanced tabular question answering (Tabular QA), yet they struggle with open-domain queries exhibiting underspecified or uncertain expressions. To …

Read Paper →

Computer Science Preprint PDF DOI

Reproduction Beyond Benchmarks: ConstBERT and ColBERT-v2 Across Backends and Query Distributions

Utshab Kumar Ghosh, Ashish David, Shubham Chatterjee · 2026

Reproducibility must validate architectural robustness, not just numerical accuracy. We evaluate ColBERT-v2 and ConstBERT across five dimensions, finding that while ConstBERT reproduces within 0.05% M…

Read Paper →

Mathematics Preprint PDF DOI

Determinantally Equivalent Functions Beyond the Nowhere-Zero Case

Harry Sapranidis Mantelos · 2026

Let $\Lambda$ be a set and $\mathbb{F}$ a field, and suppose that $K,Q:\Lambda^2\to\mathbb{F}$ are two functions such that for any $n\in\mathbb{N}$ and $x_1,x_2,\ldots,x_n\in\Lambda$, the determinants…

Read Paper →

Computer Science Preprint PDF DOI

FGR-ColBERT: Identifying Fine-Grained Relevance Tokens During Retrieval

Antonin Jarolim, Martin Fajcik · 2026

Document retrieval identifies relevant documents but does not provide fine-grained evidence cues, such as specific relevant spans. A possible solution is to apply an LLM after retrieval; however, this…

Read Paper →

AI & Data Science Preprint PDF DOI

Marco DeepResearch: Unlocking Efficient Deep Research Agents via Verification-Centric Design

Bin Zhu, Qianghuai Jia, Tian Lan, Junyang Ren, Feng Gu, Feihu Jiang, Longyue Wang, Zhao Xu, Weihua Luo · 2026

Deep research agents autonomously conduct open-ended investigations, integrating complex information retrieval with multi-step reasoning across diverse sources to solve real-world problems. To sustain…

Read Paper →

Computer Science Preprint PDF DOI

ColBERT-Att: Late-Interaction Meets Attention for Enhanced Retrieval

Raj Nath Patel, Sourav Dutta · 2026

Vector embeddings from pre-trained language models form a core component in Neural Information Retrieval systems across a multitude of knowledge extraction tasks. The paradigm of late interaction, int…

Read Paper →

Computer Science Preprint PDF DOI

PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Generation Systems

Haozhen Wang, Haoyue Liu, Jionghao Zhu, Zhichao Wang, Yongxin Guo, Xiaoying Tang · 2026

Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of applications. However, their practical deployment is often hindered by issues such as outdated knowledge an…

Read Paper →

Computer Science Preprint PDF DOI

From Questions to Trust Reports: A LLM-IR Framework for the TREC 2025 DRAGUN Track

Ignacy Alwasiak, Kene Nnolim, Jaclyn Thi, Samy Ateia, Markus Bink, Gregor Donabauer, David Elsweiler, Udo Kruschwitz · 2026

The DRAGUN Track at TREC 2025 targets the growing need for effective support tools that help users evaluate the trustworthiness of online news. We describe the UR_Trecking system submitted for both Ta…

Read Paper →

Mathematics Preprint PDF DOI

Coherent RFRS groups

Sam P. Fisher, Marco Linton, Pablo Sanchez-Peralta · 2026

We prove that a finitely generated virtually RFRS group of cohomological dimension at most $2$ is coherent if and only if its second $L^{2}$-Betti number vanishes if and only if it is virtually free-b…

Read Paper →

Computer Science Preprint PDF DOI

Overview of the TREC 2025 Retrieval Augmented Generation (RAG) Track

Shivani Upadhyay, Nandan Thakur, Ronak Pradeep, Nick Craswell, Daniel Campos, Jimmy Lin · 2026

The second edition of the TREC Retrieval Augmented Generation (RAG) Track advances research on systems that integrate retrieval and generation to address complex, real-world information needs. Buildin…

Read Paper →

Physics Preprint PDF DOI

Ly{\alpha} Nebulae in HETDEX: The Largest Statistical Census Bridging Ly{\alpha} Halos and Blobs across Cosmic Noon

Erin Mentuch Cooper, Karl Gebhardt, Dustin Davis, Robin Ciardullo, Chris Byrohl, Chenxu Liu, Maya H. Debski, Oscar A. Chavez Ortiz, Maximilian Fabricius, Daniel J. Farrow, Steven L. Finkelstein, Caryl Gronwall, Gary J. Hill, Maja Lujan Niemeyer, Brianna McKay, Shiro Mukae, Masami Ouchi, Huub Rottgering, Donald P. Schneider, Sarah Tuttle, Lutz Wisotzki, Gregory Zeimann, Sai Zhai · 2026

The Hobby-Eberly Dark Energy Experiment (HETDEX) is an untargeted ~540 deg^2 spectroscopic survey of Ly{\alpha} emission in the 1.9 < z < 3.5 Universe. In surface brightness, this survey reaches 1{\si…

Read Paper →

Browse Research Papers

NuggetIndex: Governed Atomic Retrieval for Maintainable RAG

MAIC-UI: Making Interactive Courseware with Generative UI

Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling

Lost in Decoding? Reproducing and Stress-Testing the Look-Ahead Prior in Generative Retrieval

A Parametric Memory Head for Continual Generative Retrieval

Efficient Rationale-based Retrieval: On-policy Distillation from Generative Rerankers based on JEPA

Benchmarking Vision Foundation Models for Domain-Generalizable Face Anti-Spoofing

MARCO: Navigating the Unseen Space of Semantic Correspondence

Enhancing Ly{\alpha} Emitter Identification in HETDEX with a Convolutional Neural Network

ODUTQA-MDC: A Task for Open-Domain Underspecified Tabular QA with Multi-turn Dialogue-based Clarification

Reproduction Beyond Benchmarks: ConstBERT and ColBERT-v2 Across Backends and Query Distributions

Determinantally Equivalent Functions Beyond the Nowhere-Zero Case

FGR-ColBERT: Identifying Fine-Grained Relevance Tokens During Retrieval

Marco DeepResearch: Unlocking Efficient Deep Research Agents via Verification-Centric Design

ColBERT-Att: Late-Interaction Meets Attention for Enhanced Retrieval

PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Generation Systems

From Questions to Trust Reports: A LLM-IR Framework for the TREC 2025 DRAGUN Track

Coherent RFRS groups

Overview of the TREC 2025 Retrieval Augmented Generation (RAG) Track

Ly{\alpha} Nebulae in HETDEX: The Largest Statistical Census Bridging Ly{\alpha} Halos and Blobs across Cosmic Noon

Browse by Category

Research Type

Publish Your Research