920+ open-access research outputs.
Retrieval-augmented generation (RAG) systems are frequently evaluated via fact-based metrics, yet standard implementations retrieve passages or static propositions. This unit mismatch between evaluati…
Creating interactive STEM courseware traditionally requires HTML/CSS/JavaScript expertise, leaving barriers for educators. While generative AI can produce HTML codes, existing tools generate static pr…
We present Marco-MoE, a suite of fully open multilingual sparse Mixture-of-Experts (MoE) models. Marco-MoE features a highly sparse design in which only around 5\% of the total parameters are activate…
Generative retrieval (GR) ranks documents by autoregressively generating document identifiers. Because many GR methods rely on trie-constrained beam search, they are vulnerable to early pruning of rel…
Generative information retrieval (GenIR) consolidates retrieval into a single neural model that decodes document identifiers (docids) directly from queries. While this model-as-index paradigm offers a…
Unlike traditional fact-based retrieval, rationale-based retrieval typically necessitates cross-encoding of query-document pairs using large language models, incurring substantial computational costs.…
Face Anti-Spoofing (FAS) remains challenging due to the requirement for robust domain generalization across unseen environments. While recent trends leverage Vision-Language Models (VLMs) for semantic…
Recent advances in semantic correspondence rely on dual-encoder architectures, combining DINOv2 with diffusion backbones. While accurate, these billion-parameter models generalize poorly beyond traini…
We present a deep learning framework to enhance the identification of Ly$\alpha$ emitters (LAEs) in the Hobby-Eberly Telescope Dark Energy Experiment (HETDEX), an untargeted spectroscopic survey of LA…
The advancement of large language models (LLMs) has enhanced tabular question answering (Tabular QA), yet they struggle with open-domain queries exhibiting underspecified or uncertain expressions. To …
Reproducibility must validate architectural robustness, not just numerical accuracy. We evaluate ColBERT-v2 and ConstBERT across five dimensions, finding that while ConstBERT reproduces within 0.05% M…
Let $\Lambda$ be a set and $\mathbb{F}$ a field, and suppose that $K,Q:\Lambda^2\to\mathbb{F}$ are two functions such that for any $n\in\mathbb{N}$ and $x_1,x_2,\ldots,x_n\in\Lambda$, the determinants…
Document retrieval identifies relevant documents but does not provide fine-grained evidence cues, such as specific relevant spans. A possible solution is to apply an LLM after retrieval; however, this…
Deep research agents autonomously conduct open-ended investigations, integrating complex information retrieval with multi-step reasoning across diverse sources to solve real-world problems. To sustain…
Vector embeddings from pre-trained language models form a core component in Neural Information Retrieval systems across a multitude of knowledge extraction tasks. The paradigm of late interaction, int…
Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of applications. However, their practical deployment is often hindered by issues such as outdated knowledge an…
The DRAGUN Track at TREC 2025 targets the growing need for effective support tools that help users evaluate the trustworthiness of online news. We describe the UR_Trecking system submitted for both Ta…
We prove that a finitely generated virtually RFRS group of cohomological dimension at most $2$ is coherent if and only if its second $L^{2}$-Betti number vanishes if and only if it is virtually free-b…
The second edition of the TREC Retrieval Augmented Generation (RAG) Track advances research on systems that integrate retrieval and generation to address complex, real-world information needs. Buildin…
The Hobby-Eberly Dark Energy Experiment (HETDEX) is an untargeted ~540 deg^2 spectroscopic survey of Ly{\alpha} emission in the 1.9 < z < 3.5 Universe. In surface brightness, this survey reaches 1{\si…
Free open-access publishing with Google Scholar indexing.
Submission Guide →