David Lowry Duda in Computer Science — Research Repository

Computer Science Preprint PDF DOI

Revealing NVIDIA Closed-Source Driver Command Streams for CPU-GPU Runtime Behavior Insight

Yuang Yan, Ian Karlin, Ryan Grant · 2026

For NVIDIA GPUs, CUDA is the primary interface through which applications orchestrate GPU execution, yet much of the logic that realizes CUDA operations resides in NVIDIA's closed-source userspace dri…

Read Paper →

Computer Science Preprint PDF DOI

FACT: Compositional Kernel Synthesis with a Three-Stage Agentic Workflow

Sina Heidari, Dimitrios S. Nikolopoulos · 2026

Deep learning compilers and vendor libraries deliver strong baseline performance but are bounded by finite, engineer-curated catalogs. When these omit needed optimizations, practitioners substitute ha…

Read Paper →

Computer Science Preprint PDF DOI

CUDA Kernel Optimization and Counter-Free Performance Analysis for Depthwise Convolution in Cloud Environments

Huriyeh Babak, Melanie Schaller · 2026

Efficient GPU execution of convolution operators is governed by memory-access efficiency, on-chip data reuse, and execution mapping rather than arithmetic throughput alone. This paper presents a contr…

Read Paper →

Computer Science Preprint PDF DOI

Prompt-Unknown Promotion Attacks against LLM-based Sequential Recommender Systems

Yuchuan Zhao, Tong Chen, Junliang Yu, Zongwei Wang, Lizhen Cui, Hongzhi Yin · 2026

Large language model-powered sequential recommender systems (LLM-SRSs) have recently demonstrated remarkable performance, enabling recommendations through prompt-driven inference over user interaction…

Read Paper →

Computer Science Preprint PDF DOI

ClusterFusion++: Expanding Cluster-Level Fusion to Full Transformer-Block Decoding

ChiHeng Jin, Hongche Yu, Xihui Chen · 2026

Large language model (LLM) decoding is latency-sensitive and often bottlenecked by fragmented operator execution and repeated off-chip materialization of intermediate tensors. Prior work expands fusio…

Read Paper →

Computer Science Preprint PDF DOI

ARCHES: Adaptive Real-Time Switching of AI Models for the RAN

Neagin Neasamoni Santhi, Davide Villa, Michele Polese, Salvatore D'Oro, Yunseong Lee, Koichiro Furueda, Tommaso Melodia · 2026

Artificial Intelligence (AI) has become a powerful tool for model-free Radio Access Network (RAN) signal processing and optimization. However, designing a single model that generalizes across all radi…

Read Paper →

Computer Science Preprint PDF DOI

Accelerating Intra-Node GPU-to-GPU Communication Through Multi-Path Transfers with CUDA Graphs

Amirhossein Sojoodi, Yiltan Hassan Temucin, Amirreza Baratisedeh, Hamed Sharifian, Ahmad Afsahi · 2026

Effective intra-node GPU communication is essential for optimizing performance in MPI-based HPC applications, especially when leveraging multiple communication paths. In this study, we propose a novel…

Read Paper →

Computer Science Preprint PDF DOI

FlashSpread: IO-Aware GPU Simulation of Non-Markovian Epidemic Dynamics via Kernel Fusion

Heman Shakeri, Behnaz Moradi-Jamei, Aram Vajdi, Ehsan Ardjmand · 2026

Non-Markovian (renewal) epidemic simulation on multi-million-node contact networks is essential for realistic forecasting under general age-dependent holding-time distributions (log-normal, Weibull, E…

Read Paper →

Computer Science Preprint PDF DOI

CuRast: Cuda-Based Software Rasterization for Billions of Triangles

Markus Schutz, Lukas Lipp, Elias Kristmann, Michael Wimmer · 2026

Previous work shows that small triangles can be rasterized efficiently with compute shaders. Building on this insight, we explore how far this can be pushed for massive triangle datasets without the n…

Read Paper →

Computer Science Preprint PDF DOI

Analysis of AWW (Anganwadi Workers) Training Content, ILA (Incremental Learning Approach) Modules Following CDT (Component Display Theory)

Arka Majhi, Satish B. Agnihotri · 2026

POSHAN Abhiyan envisages capacity building of AWWs or frontline health workers through 21 training modules of ILA (Incremental Learning Approach), modularising the net learning content into smaller le…

Read Paper →

Computer Science Preprint PDF DOI

A Fully GPU-Accelerated Framework for High-Performance Configuration Interaction Selection with Neural Network Quantum States

Daran Sun, Bowen Kan, Haoquan Long, Hairui Zhao, Haoxu Li, Yicheng Liu, Pengyu Zhou, Ankang Feng, Wenjing Huang, Yida Gu, Zhenyu Li, Honghui Shang, Yunquan Zhang, Dingwen Tao, Ninghui Sun, Guangming Tan · 2026

AI-driven methods have demonstrated considerable success in tackling the central challenge of accurately solving the Schr\"odinger equation for complex many-body systems. Among neural network quantum …

Read Paper →

Computer Science Preprint PDF DOI

Fleet: Hierarchical Task-based Abstraction for Megakernels on Multi-Die GPUs

Sangeeta Chowdhary, Ryan Swann, Sean Siddens, Muhammad Osama, Stephen Neuendorffer, Alexandru Dutu, Karthik Sangaiah, Sandeepa Bhuyan, Samuel Bayliss, Ganesh Dasika · 2026

Modern GPUs adopt chiplet-based designs with multiple private cache hierarchies, but current programming models (CUDA/HIP) expose a flat execution hierarchy that cannot express chiplet-level locality …

Read Paper →

Computer Science Preprint PDF DOI

AVID: A Benchmark for Omni-Modal Audio-Visual Inconsistency Understanding via Agent-Driven Construction

Zixuan Chen, Depeng Wang, Hao Lin, Li Luo, Ke Xu, Ya Guo, Huijia Zhu, Tanfeng Sun, Xinghao Jiang · 2026

We present AVID, the first large-scale benchmark for audio-visual inconsistency understanding in videos. While omni-modal large language models excel at temporally aligned tasks such as captioning and…

Read Paper →

Computer Science Preprint PDF DOI

Fast Voxelization and Level of Detail for Microgeometry Rendering

Javier Fabre, Carlos Castillo, Carlos Rodriguez-Pardo, Jorge Lopez-Moreno · 2026

Many materials show anisotropic light scattering patterns due to the shape and local alignment of their underlying micro structures: surfaces with small elements such as fibers, or the ridges of a bru…

Read Paper →

Computer Science Preprint PDF DOI

A Non-Probabilistic Game-Theoretic Information Theory Which Subsumes Probabilistic Channel Coding

Cheuk Ting Li · 2026

Probabilistic settings (e.g., vanishing-error channel coding) and non-probabilistic settings (e.g., zero-error channel coding and adversarial channels) were considered two related but different branch…

Read Paper →

Computer Science Preprint PDF DOI

City-Scale Visibility Graph Analysis via GPU-Accelerated HyperBall

Alex Hodge, Melissa Barrientos Trinanes · 2026

Visibility Graph Analysis (VGA) is a key space syntax method for understanding how spatial configuration shapes human movement, but its reliance on all-pairs BFS computation limits practical applicati…

Read Paper →

Computer Science Preprint PDF DOI

Foundry: Template-Based CUDA Graph Context Materialization for Fast LLM Serving Cold Start

Xueshen Liu, Yongji Wu, Yuncheng Yao, Danyang Zhuo, Ion Stoica, Z. Morley Mao · 2026

Modern LLM service providers increasingly rely on autoscaling and parallelism reconfiguration to respond to rapidly changing workloads, but cold-start latency remains a major bottleneck. While recent …

Read Paper →

Computer Science Preprint PDF DOI

The Theorems of Dr. David Blackwell and Their Contributions to Artificial Intelligence

Napoleon Paxton · 2026

Dr. David Blackwell was a mathematician and statistician of the first rank, whose contributions to statistical theory, game theory, and decision theory predated many of the algorithmic breakthroughs t…

Read Paper →

Computer Science Preprint PDF DOI

JZ-Tree: GPU friendly neighbour search and friends-of-friends with dual tree walks in JAX plus CUDA

Jens Stucker, Oliver Hahn, Lukas Winkler, Adrian Gutierrez Adame, Thomas Floss · 2026

Algorithms based on spatial tree traversal are widely regarded as among the most efficient and flexible approaches for many problems in CPU-based high-performance computing (HPC). However, directly tr…

Read Paper →

Computer Science Preprint PDF DOI

Adaptive Tensor Network Simulation via Entropy-Feedback PID Control and GPU-Accelerated SVD

Harshni Kumaresan, Gayathri Muruganantham, Lakshmi Rajendran, Santhosh Sivasubramani · 2026

Tensor network methods, particularly those based on Matrix Product States (MPS), provide a powerful framework for simulating quantum many-body systems. A persistent computational challenge in these me…

Read Paper →

Browse Research Papers

Revealing NVIDIA Closed-Source Driver Command Streams for CPU-GPU Runtime Behavior Insight

FACT: Compositional Kernel Synthesis with a Three-Stage Agentic Workflow

CUDA Kernel Optimization and Counter-Free Performance Analysis for Depthwise Convolution in Cloud Environments

Prompt-Unknown Promotion Attacks against LLM-based Sequential Recommender Systems

ClusterFusion++: Expanding Cluster-Level Fusion to Full Transformer-Block Decoding

ARCHES: Adaptive Real-Time Switching of AI Models for the RAN

Accelerating Intra-Node GPU-to-GPU Communication Through Multi-Path Transfers with CUDA Graphs

FlashSpread: IO-Aware GPU Simulation of Non-Markovian Epidemic Dynamics via Kernel Fusion

CuRast: Cuda-Based Software Rasterization for Billions of Triangles

Analysis of AWW (Anganwadi Workers) Training Content, ILA (Incremental Learning Approach) Modules Following CDT (Component Display Theory)

A Fully GPU-Accelerated Framework for High-Performance Configuration Interaction Selection with Neural Network Quantum States

Fleet: Hierarchical Task-based Abstraction for Megakernels on Multi-Die GPUs

AVID: A Benchmark for Omni-Modal Audio-Visual Inconsistency Understanding via Agent-Driven Construction

Fast Voxelization and Level of Detail for Microgeometry Rendering

A Non-Probabilistic Game-Theoretic Information Theory Which Subsumes Probabilistic Channel Coding

City-Scale Visibility Graph Analysis via GPU-Accelerated HyperBall

Foundry: Template-Based CUDA Graph Context Materialization for Fast LLM Serving Cold Start

The Theorems of Dr. David Blackwell and Their Contributions to Artificial Intelligence

JZ-Tree: GPU friendly neighbour search and friends-of-friends with dual tree walks in JAX plus CUDA

Adaptive Tensor Network Simulation via Entropy-Feedback PID Control and GPU-Accelerated SVD

Browse by Category

Research Type

Publish Your Research