John Danskin in Computer Science — Research Repository

Computer Science Preprint PDF DOI

How Hard is it to Decide if a Fact is Relevant to a Query?

Meghyn Bienvenu, Diego Figueira, Pierre Lafourcade · 2026

We consider the following fundamental problem: given a database D, Boolean conjunctive query (CQ) q, and fact f in D, decide whether f is relevant to q wrt. D, i.e., does f belong to a minimal subset …

Read Paper →

Computer Science Preprint PDF DOI

Scaling Worst-Case Optimal Datalog to GPUs

Yihao Sun, Kunting Qi, Thomas Gilray, Sidharth Kumar, Kristopher Micinski · 2026

Datalog is a declarative logic-programming language used for complex analytic reasoning workloads such as program analysis and graph analytics. Datalog's popularity is due to its unique price-point, m…

Read Paper →

Computer Science Preprint PDF DOI

3DPipe: A Pipelined GPU Framework for Scalable Generalized Spatial Join over Polyhedral Objects

Lyuheng Yuan, Da Yan, Akhlaque Ahmad, Fusheng Wang · 2026

Spatial join is a fundamental operation in spatial databases. With the rapid growth of 3D data in applications such as LiDAR-based object detection and 3D digital pathology, there is an increasing nee…

Read Paper →

Computer Science Preprint PDF DOI

Orthogonal Strip Partitioning of Polygons: Lattice-Theoretic Algorithms and Lower Bounds

Jaehoon Chung · 2026

We study a variant of a polygon partition problem, introduced by Chung, Iwama, Liao, and Ahn [ISAAC'25]. Given orthogonal unit vectors $\mathbf{u},\mathbf{v}\in \mathbb{R}^2$ and a polygon $P$ with $n…

Read Paper →

Computer Science Preprint PDF DOI

RELOAD: A Robust and Efficient Learned Query Optimizer for Database Systems

Seokwon Lee, Jaeyoung Sim, Sihyun Kim, Yuhsing Li, Yiwen Zhu, Kwanghyun Park · 2026

Recent advances in query optimization have shifted from traditional rule-based and cost-based techniques towards machine learning-driven approaches. Among these, reinforcement learning (RL) has attrac…

Read Paper →

Computer Science Preprint PDF DOI

Sidorenko-Inspired Pessimistic Estimation

Yu-Ting Lin, Hsin-Po Wang · 2026

Recently, Abo Khamis et al. showed how to upper bound the size of a join of multiple tables, a problem essential to query optimization in database theory. They unified earlier works by the following i…

Read Paper →

Computer Science Preprint PDF DOI

PLOP: Cost-Based Placement of Semantic Operators in Hybrid Query Plans

Qiuyang Mang, Yufan Xiang, Hangrui Zhou, Runyuan He, Jiaxiang Yu, Hanchen Li, Aditya Parameswaran, Alvin Cheung · 2026

Recent database systems have introduced semantic operators that leverage large language models (LLMs) to filter, join, and project over structured data using natural language predicates. In practice, …

Read Paper →

Computer Science Preprint PDF DOI

Post-Quantum Cryptographic Analysis of Message Transformations Across the Network Stack

Ashish Kundu, Vishal Chakraborty, Ramana Kompella · 2026

When a user sends a message over a wireless network, the message does not travel as-is. It is encrypted, authenticated, encapsulated, and transformed as it descends the protocol stack from the applica…

Read Paper →

Computer Science Preprint PDF DOI

SynQL: A Controllable and Scalable Rule-Based Framework for SQL Workload Synthesis for Performance Benchmarking

Kahan Mehta, Amit Mankodi · 2026

Database research and the development of learned query optimisers rely heavily on realistic SQL workloads. Acquiring real-world queries is increasingly difficult, however, due to strict privacy regula…

Read Paper →

Computer Science Preprint PDF DOI

GTaP: A GPU-Resident Fork-Join Task-Parallel Runtime with a Pragma-Based Interface

Yuki Maeda, Kenjiro Taura · 2026

Graphics Processing Units (GPUs) excel at regular data-parallel workloads where massive hardware parallelism can be readily exploited. In contrast, many important irregular applications are naturally …

Read Paper →

Computer Science Preprint PDF DOI

VectraFlow: Long-Horizon Semantic Processing over Data and Event Streams with LLMs

Shu Chen, Junhan Liu, Deepti Raghavan, Ugur Cetintemel · 2026

Monitoring continuous data for meaningful signals increasingly demands long-horizon, stateful reasoning over unstructured streams. However, today's LLM frameworks remain stateless and one-shot, and tr…

Read Paper →

Computer Science Preprint PDF DOI

Optimizing Relational Queries over Array-Valued Data in Columnar Systems

Maroua Zeblah (TYREX), Etienne Couritas, Sarah Chlyah (TYREX), Pierre Geneves (TYREX), Nils Gesbert (TYREX), Nabil Layaida (TYREX) · 2026

Modern analytical workloads increasingly combine relational data with array-valued attributes. While columnar database systems efficiently process such workloads, their ability to optimize queries tha…

Read Paper →

Computer Science Preprint PDF DOI

WN-Wrangle: Wireless Network Data Wrangling Assistant

Anirudh Kamath, Dustin Maas, Jacobus Van der Merwe, Anna Fariha · 2026

Data wrangling continues to be the most time-consuming task in the data science pipeline and wireless network data is no exception. Prior approaches for automatic or assisted data-wrangling primarily …

Read Paper →

Computer Science Preprint PDF DOI

Computer-Orchestrated Design of Algorithms: From Join Specification to Implementation

Zeyuan Hu · 2026

Equipping query processing systems with provable theoretical guarantees has been a central focus at the intersection of database theory and systems in recent years. However, the divergence between the…

Read Paper →

Computer Science Preprint PDF DOI

SQL-Commenter: Aligning Large Language Models for SQL Comment Generation with Direct Preference Optimization

Lei Yu, Peng Wang, Jingyuan Zhang, Xin Wang, Jia Xu, Li Yang, Changzhi Deng, Jiajia Ma, Fengjun Zhang · 2026

SQL query comprehension is a significant challenge due to complex syntax, diverse join types, and deep nesting. Many queries lack adequate comments, severely hindering code readability, maintainabilit…

Read Paper →

Computer Science Preprint PDF DOI

Practical MCTS-based Query Optimization: A Reproducibility Study and new MCTS algorithm for complex queries

Vladimir Burlakov, Alena Rybakina, Sergey Kudashev, Konstantin Gilev, Alexander Demin, Denis Ponomaryov, Yuriy Dorn · 2026

Monte Carlo Tree Search (MCTS) has been proposed as a transformative approach to join-order optimization in database query processing, with recent frameworks such as AlphaJoin and HyperQO claiming to …

Read Paper →

Computer Science Preprint PDF DOI

Work Sharing and Offloading for Efficient Approximate Threshold-based Vector Join

Kyoungmin Kim, Lennart Roth, Liang Liang, Anastasia Ailamaki · 2026

Vector joins - finding all vector pairs between a set of query and data vectors whose distances are below a given threshold - are fundamental to modern vector and vector-relational database systems th…

Read Paper →

Computer Science Preprint PDF DOI

Accelerating Approximate Analytical Join Queries over Unstructured Data with Statistical Guarantees

Yuxuan Zhu, Tengjun Jin, Chenghao Mo, Daniel Kang · 2026

Analytical join queries over unstructured data are increasingly prevalent in data analytics. Applying machine learning (ML) models to label every pair in the cross product of tables can achieve state-…

Read Paper →

Computer Science Preprint PDF DOI

Partial Partial Aggregates

Claude Brisson · 2026

We introduce partial partial aggregates (PPA), a query optimization technique for distributed engines that pushes only the local compute phase of an aggregate operation through joins. A query that agg…

Read Paper →

Computer Science Preprint PDF DOI

Succinct Structure Representations for Efficient Query Optimization

Zhekai Jiang, Qichen Wang, Christoph Koch · 2026

Structural decomposition methods offer powerful theoretical guarantees for join evaluation, yet they are rarely used in real-world query optimizers. A major reason is the difficulty of combining cost-…

Read Paper →

Browse Research Papers

How Hard is it to Decide if a Fact is Relevant to a Query?

Scaling Worst-Case Optimal Datalog to GPUs

3DPipe: A Pipelined GPU Framework for Scalable Generalized Spatial Join over Polyhedral Objects

Orthogonal Strip Partitioning of Polygons: Lattice-Theoretic Algorithms and Lower Bounds

RELOAD: A Robust and Efficient Learned Query Optimizer for Database Systems

Sidorenko-Inspired Pessimistic Estimation

PLOP: Cost-Based Placement of Semantic Operators in Hybrid Query Plans

Post-Quantum Cryptographic Analysis of Message Transformations Across the Network Stack

SynQL: A Controllable and Scalable Rule-Based Framework for SQL Workload Synthesis for Performance Benchmarking

GTaP: A GPU-Resident Fork-Join Task-Parallel Runtime with a Pragma-Based Interface

VectraFlow: Long-Horizon Semantic Processing over Data and Event Streams with LLMs

Optimizing Relational Queries over Array-Valued Data in Columnar Systems

WN-Wrangle: Wireless Network Data Wrangling Assistant

Computer-Orchestrated Design of Algorithms: From Join Specification to Implementation

SQL-Commenter: Aligning Large Language Models for SQL Comment Generation with Direct Preference Optimization

Practical MCTS-based Query Optimization: A Reproducibility Study and new MCTS algorithm for complex queries

Work Sharing and Offloading for Efficient Approximate Threshold-based Vector Join

Accelerating Approximate Analytical Join Queries over Unstructured Data with Statistical Guarantees

Partial Partial Aggregates

Succinct Structure Representations for Efficient Query Optimization

Browse by Category

Research Type

Publish Your Research