Expertini Research Research

Browse Research Papers

4,991+ open-access research outputs.

โœ• Clear
๐Ÿ” david lowry duda
Showing 4991 results for "david lowry duda"
Economics & Finance Preprint PDF DOI

Fast-Vollib: A Fast Implied Volatility Library for Pythonwith PyTorch, JAX, and CUDA Fused-Kernel Backends

Raeid Saqur ยท 2026

We present fast-vollib, an open-source Python library that provides high-performance European option pricing, implied volatility (IV) computation, and Greeks under the Black-76, Black-Scholes, and Blaโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Revealing NVIDIA Closed-Source Driver Command Streams for CPU-GPU Runtime Behavior Insight

Yuang Yan, Ian Karlin, Ryan Grant ยท 2026

For NVIDIA GPUs, CUDA is the primary interface through which applications orchestrate GPU execution, yet much of the logic that realizes CUDA operations resides in NVIDIA's closed-source userspace driโ€ฆ

Read Paper โ†’
AI & Data Science Preprint PDF DOI

MesonGS++: Post-training Compression of 3D Gaussian Splatting with Hyperparameter Searching

Shuzhao Xie, Junchen Ge, Weixiang Zhang, Jiahang Liu, Chen Tang, Yunpeng Bai, Shijia Ge, Jingyan Jiang, Yuzhi Huang, Fengnian Yang, Cong Zhang, Xiaoyi Fan, Zhi Wang ยท 2026

3D Gaussian Splatting (3DGS) achieves high-quality novel view synthesis with real-time rendering, but its storage cost remains prohibitive for practical deployment. Existing post-training compression โ€ฆ

Read Paper โ†’
Physics Preprint PDF DOI

SPHEREx Ultracool Dwarf spectral Atlas (SUDA): Atmospheric and Fundamental Parameters of Ultracool Dwarfs

Zhijun Tu, Shu Wang, Haomiao Huang, Xiaodian Chen, Jifeng Liu ยท 2026

We present the SPHEREx Ultracool Dwarf spectral Atlas (SUDA), a homogeneous sample of 1675 ultracool dwarfs with continuous 0.75--5 $\mu$m spectroscopy from SPHEREx QR2. Using the SAND and ATMO2020++ โ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

FACT: Compositional Kernel Synthesis with a Three-Stage Agentic Workflow

Sina Heidari, Dimitrios S. Nikolopoulos ยท 2026

Deep learning compilers and vendor libraries deliver strong baseline performance but are bounded by finite, engineer-curated catalogs. When these omit needed optimizations, practitioners substitute haโ€ฆ

Read Paper โ†’
Physics Preprint PDF DOI

$\texttt{cuSkyrmion}$: A CUDA-OpenGL framework for interactive simulation and visualization of nuclei as Skyrmions

Sven Bjarke Gudnason, Paul Leask ยท 2026

We introduce $\texttt{cuSkyrmion}$, a 3-dimensional Skyrme model computation and visualization software, that is written in CUDA C for rapid computation and visualization of especially the arrested Neโ€ฆ

Read Paper โ†’
AI & Data Science Preprint PDF DOI

DDA-Thinker: Decoupled Dual-Atomic Reinforcement Learning for Reasoning-Driven Image Editing

Hanqing Yang, Qiang Zhou, Yongchao Du, Sashuai Zhou, Zhibin Wang, Jun Song, Tiezheng Ge, Cheng Yu, Bo Zheng ยท 2026

Recent image editing models have achieved strong visual fidelity but often struggle with tasks requiring complex reasoning. To investigate and enhance the reasoning-grounded planning for image editingโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

CUDA Kernel Optimization and Counter-Free Performance Analysis for Depthwise Convolution in Cloud Environments

Huriyeh Babak, Melanie Schaller ยท 2026

Efficient GPU execution of convolution operators is governed by memory-access efficiency, on-chip data reuse, and execution mapping rather than arithmetic throughput alone. This paper presents a contrโ€ฆ

Read Paper โ†’
Mathematics Preprint PDF DOI

A colimit decomposition for the loop homology of polyhedral products

Lewis Stanton, Fedor Vylegzhanin ยท 2026

We show that the loop homology algebras of polyhedral products of the form $(\underline{X},\underline{*})^{\mathcal{K}}$ can be written as a colimit over the flagification of $\mathcal{K}$, and obtainโ€ฆ

Read Paper โ†’
AI & Data Science Preprint PDF DOI

PointTransformerX: Portable and Efficient 3D Point Cloud Processing without Sparse Algorithms

Laurenz Reichardt, Nikolas Ebert, Oliver Wasenmuller ยท 2026

3D point cloud perception remains tightly coupled to custom CUDA operators for spatial operations, limiting portability and efficiency on non-NVIDIA, AMD, and embedded hardware. We introduce PointTranโ€ฆ

Read Paper โ†’
Mathematics Preprint PDF DOI

SUDA-Muon: Structural Design Principles and Boundaries for Fully Decentralized Muon

Hengrui Zhang, Boao Kong, Jiahe Geng, Zhengyang Huang ยท 2026

Fully decentralized Muon is difficult because its nonlinear matrix-sign operator does not commute with linear gossip averaging. This makes decentralized Muon a structural design problem: in designing โ€ฆ

Read Paper โ†’
Mathematics Preprint PDF DOI

Sharp pathwise nonuniqueness for additive SDEs

Elias Hess-Childs, Keefer Rowan ยท 2026

We construct a family of velocity fields demonstrating the sharpness of the classical Zvonkin--Veretennikov--Davie strong well-posedness by noise regime. We consider stochastic differential equations โ€ฆ

Read Paper โ†’
AI & Data Science Preprint PDF DOI

Building a GPU-Accelerated Multivariate Statistics Platform

Mike Crowhurst ยท 2026

Classical multivariate statistical methods such as covariance estimation and principal component analysis are well understood mathematically, yet their application at extreme data scales remains challโ€ฆ

Read Paper โ†’
AI & Data Science Preprint PDF DOI

ELSA: Exact Linear-Scan Attention for Fast and Memory-Light Vision Transformers

Chih-Chung Hsu, Xin-Di Ma, Wo-Ting Liao, Chia-Ming Lee ยท 2026

Existing attention accelerators often trade exact softmax semantics, depend on fused Tensor Core kernels, or incur sequential depth that limits FP32 throughput on long sequences. We present \textbf{ELโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Prompt-Unknown Promotion Attacks against LLM-based Sequential Recommender Systems

Yuchuan Zhao, Tong Chen, Junliang Yu, Zongwei Wang, Lizhen Cui, Hongzhi Yin ยท 2026

Large language model-powered sequential recommender systems (LLM-SRSs) have recently demonstrated remarkable performance, enabling recommendations through prompt-driven inference over user interactionโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

ClusterFusion++: Expanding Cluster-Level Fusion to Full Transformer-Block Decoding

ChiHeng Jin, Hongche Yu, Xihui Chen ยท 2026

Large language model (LLM) decoding is latency-sensitive and often bottlenecked by fragmented operator execution and repeated off-chip materialization of intermediate tensors. Prior work expands fusioโ€ฆ

Read Paper โ†’
AI & Data Science Preprint PDF DOI

Hybrid JIT-CUDA Graph Optimization for Low-Latency Large Language Model Inference

Divakar Kumar Yadav, Tian Zhao ยท 2026

Large Language Models (LLMs) have achieved strong performance across natural language and multimodal tasks, yet their practical deployment remains constrained by inference latency and kernel launch ovโ€ฆ

Read Paper โ†’
AI & Data Science Preprint PDF DOI

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs

Divakar Kumar Yadav, Tian Zhao, Deepak Kumar ยท 2026

NVIDIA's CUDA Tile (CuTile) introduces a Python-based, tile-centric abstraction for GPU kernel development that aims to simplify programming while retaining Tensor Core and Tensor Memory Accelerator (โ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

ARCHES: Adaptive Real-Time Switching of AI Models for the RAN

Neagin Neasamoni Santhi, Davide Villa, Michele Polese, Salvatore D'Oro, Yunseong Lee, Koichiro Furueda, Tommaso Melodia ยท 2026

Artificial Intelligence (AI) has become a powerful tool for model-free Radio Access Network (RAN) signal processing and optimization. However, designing a single model that generalizes across all radiโ€ฆ

Read Paper โ†’
Physics Preprint PDF DOI

gateau: an observation simulator for ground-based submillimeter astronomy with integral field units and kinetic inductance detectors

A. Moerman, N. Soshnin, S. A. Brackenhoff, S. O. Dabironezare, K. Karatsu, L. H. Marting, S. A. H. de Rooij, M. Roos, B. R. Brandl, A. Endo ยท 2026

Submillimeter (submm) integral field units (IFUs) utilising kinetic inductance detectors (KIDs) are a promising instrument architecture for the study of galaxies, galaxy clusters, and the large-scale โ€ฆ

Read Paper โ†’
Page 1 of 250 Next โ†’