Sarah Scheffler in Computer Science — Research Repository

Computer Science Preprint PDF DOI

RAG-Enhanced Kernel-Based Heuristic Synthesis (RKHS): A Structured Methodology Using Large Language Models for Hardware Design

Shiva Ahir, Alex Doboli · 2026

Heuristic design upholds modern electronic design automation (EDA) tools, yet crafting effective placement, routing, and scheduling strategies entails substantial expertise. We study how large languag…

Read Paper →

Computer Science Preprint PDF DOI

NeuralEmu: in situ Measurement-Driven, ML-based, High-Fidelity 5G Network Emulation

Haoran Wan, Yaxiong Xie, Kyle Jamieson · 2026

Current and future applications demand ultra-low latency and consistent throughput, yet frequently traverse 5G cellular networks, so cope with volatile packet dynamics, as 5G base station schedulers d…

Read Paper →

Computer Science Preprint PDF DOI

NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference

Mingbo Hao, Changwei Yan, Haoyu Cui, Zhihao Yan, Yizhi Ding, Zhangrui Qian, Weiwei Shan · 2026

The rapid growth of LLMs demands high-throughput, memory-capacity-intensive inference on resource-constrained edge devices, where single-batch decoding remains fundamentally memory-bound. Existing out…

Read Paper →

Computer Science Preprint PDF DOI

CacheFlow: Efficient LLM Serving with 3D-Parallel KV Cache Restoration

Sean Nian, Jiahao Fang, Qilong Feng, Zhiyu Wu, Fan Lai · 2026

KV cache restoration has emerged as a dominant bottleneck in serving long-context LLM workloads, including multi-turn conversations, retrieval-augmented generation, and agentic pipelines. Existing app…

Read Paper →

Computer Science Preprint PDF DOI

FEPLB: Exploiting Copy Engines for Nearly Free MoE Load Balancing in Distributed Training

Shuyao Qi, Haoyuan Liu, Shizhen Zhao · 2026

Fine-grained, per-micro-batch load balancing is essential for efficient Mixture-of-Experts (MoE) training, yet every prior dynamic scheduling scheme pays for it with extra communication that is hard t…

Read Paper →

Computer Science Preprint PDF DOI

HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing

Mao Lin, Xi Wang, Guilherme Cox, Dong Li, Hyeran Jeon · 2026

As modern LLMs support thousands to millions of tokens, KV caches grow to hundreds of gigabytes, stressing memory capacity and bandwidth. Existing solutions, such as KV cache pruning and offloading, a…

Read Paper →

Computer Science Preprint PDF DOI

MASFuzzer: Fuzz Driver Generation and Adaptive Scheduling via Multidimensional API Sequences

Xingyu Liu, Zengqin Huang, Xiang Gao, Hailong Sun · 2026

Fuzz testing of software libraries relies on fuzz drivers to invoke library APIs. Traditionally, these drivers are written manually by developers - a process that is time-consuming and often inadequat…

Read Paper →

Computer Science Preprint PDF DOI

Towards Energy Efficient Co-Scheduling in HPC

Zhong Zheng, Michael E. Papka, Zhiling Lan · 2026

Modern multi GPU HPC systems expose substantial computational capacity, yet inefficient GPU allocation often leads to wasted energy and underutilization. In practice, GPU applications exhibit heteroge…

Read Paper →

Computer Science Preprint PDF DOI

Sarus Suite: Cloud-native Containers for HPC

Alberto Madonna, Matteo Chesi, Gwangmu Lee, Michele Brambilla, Fawzi Roberto Mohamed, Felipe A. Cruz · 2026

High-performance computing (HPC) systems must support fast-moving software stacks, especially in AI/ML, while preserving scheduler control, scalable startup, and production performance. Yet many HPC c…

Read Paper →

Computer Science Preprint PDF DOI

Efficient calculation of available space for multi-NUMA virtual machines

Andrei Gudkov, Elizaveta Ponomareva, Alexis Pospelov · 2026

Increasing demand for computational power has led cloud providers to employ multi-NUMA servers and offer multi-NUMA virtual machines to their customers. However, multi-NUMA VMs introduce additional co…

Read Paper →

Computer Science Preprint PDF DOI

Nautilus: An Auto-Scheduling Tensor Compiler for Efficient Tiled GPU Kernels

Yifan Zhao, Yuchen Yang, Matei Budiu, Sasa Misailovic · 2026

We present Nautilus, a novel tensor compiler that moves toward fully automated math-to-kernel optimization. Nautilus compiles a high-level algebraic specification of tensor operators into efficient ti…

Read Paper →

Computer Science Preprint PDF DOI

Fast Concurrent Primitives Despite Contention

Michael A. Bender, Guy E. Blelloch, Martin Farach-Colton, Yang Hu, Rob Johnson, Rotem Oshman, Renfei Zhou · 2026

We study the problem of constructing concurrent objects in a setting where $P$ processes run in parallel and interact through a shared memory that is subject to write contention. Our goal is to transf…

Read Paper →

Computer Science Preprint PDF DOI

MARS: Efficient, Adaptive Co-Scheduling for Heterogeneous Agentic Systems

Yifei Wang, Hancheng Ye, Yechen Xu, Cong Guo, Chiyue Wei, Qinsi Wang, Dongting Li, Tingjun Chen, Hai "Helen" Li, Danyang Zhuo, Yiran Chen · 2026

Large language models (LLMs) are increasingly deployed as the execution core of autonomous agents rather than as standalone text generators. Agentic workloads induce a temporal shift from single-turn …

Read Paper →

Computer Science Preprint PDF DOI

ReRec: Reasoning-Augmented LLM-based Recommendation Assistant via Reinforcement Fine-tuning

Jiani Huang, Shijie Wang, Liangbo Ning, Wenqi Fan, Qing Li · 2026

With the rise of LLMs, there is an increasing need for intelligent recommendation assistants that can handle complex queries and provide personalized, reasoning-driven recommendations. LLM-based recom…

Read Paper →

Computer Science Preprint PDF DOI

CBM-Dual: A 65-nm Fully Connected Chaotic Boltzmann Machine Processor for Dual Function Simulated Annealing and Reservoir Computing

Kanta Yoshioka, Soshi Hirayae, Yuichiro Tanaka, Yuichi Katori, Takashi Morie, Hakaru Tamukoh · 2026

This paper presents CBM-Dual, the first silicon-proven digital chaotic dynamics processor (CDP) supporting both simulated annealing (SA) and reservoir computing (RC). CBM-Dual enables real-time decisi…

Read Paper →

Computer Science Preprint PDF DOI

PHAROS: Pipelined Heterogeneous Accelerators for Real-time Safety-critical Systems With Deadline Compliance

Shixin Ji, Jinming Zhuang, Sarah Schultz, Zhuoping Yang, Xingzhen Chen, Zheng Dong, Alex K. Jones, Yihui Ren, Peipei Zhou · 2026

Spatially partitioned heterogeneous accelerators (HAs) are increasingly adopted in embedded systems for their performance and flexibility. Yet most existing HA design frameworks optimize primarily for…

Read Paper →

Computer Science Preprint PDF DOI

GENSERVE: Efficient Co-Serving of Heterogeneous Diffusion Model Workloads

Fanjiang Ye, Zhangke Li, Xinrui Zhong, Ethan Ma, Russell Chen, Kaijian Wang, Jingwei Zuo, Desen Sun, Ye Cao, Triston Cao, Myungjin Lee, Arvind Krishnamurthy, Yuke Wang · 2026

Diffusion models have emerged as the prevailing approach for text-to-image (T2I) and text-to-video (T2V) generation, yet production platforms must increasingly serve both modalities on shared GPU clus…

Read Paper →

Computer Science Preprint PDF DOI

Perils of Parallelism: Transaction Fee Mechanisms under Execution Uncertainty

Sarisht Wadhwa, Aviv Yaish, Fan Zhang, Kartik Nayak · 2026

Modern blockchains increasingly rely on parallel execution to improve throughput. We show several industry and academic transaction fee mechanisms (TFMs) struggle to simultaneously account for executi…

Read Paper →

Computer Science Preprint PDF DOI

Fusion and Alignment Enhancement with Large Language Models for Tail-item Sequential Recommendation

Zhifu Wei, Yizhou Dang, Guibing Guo, Chuang Zhao, Zhu Sun · 2026

Sequential Recommendation (SR) learns user preferences from their historical interaction sequences and provides personalized suggestions. In real-world scenarios, most items exhibit sparse interaction…

Read Paper →

Computer Science Preprint PDF DOI

TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing

Zhuohang Bian, Feiyang Wu, Chengrui Zhang, Hangcheng Dong, Yun Liang, Youwei Zhuo · 2026

Multi-agent LLM applications organize execution in synchronized rounds where a central scheduler gathers outputs from all agents and redistributes the combined context. This All-Gather communication p…

Read Paper →

Browse Research Papers

RAG-Enhanced Kernel-Based Heuristic Synthesis (RKHS): A Structured Methodology Using Large Language Models for Hardware Design

NeuralEmu: in situ Measurement-Driven, ML-based, High-Fidelity 5G Network Emulation

NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference

CacheFlow: Efficient LLM Serving with 3D-Parallel KV Cache Restoration

FEPLB: Exploiting Copy Engines for Nearly Free MoE Load Balancing in Distributed Training

HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing

MASFuzzer: Fuzz Driver Generation and Adaptive Scheduling via Multidimensional API Sequences

Towards Energy Efficient Co-Scheduling in HPC

Sarus Suite: Cloud-native Containers for HPC

Efficient calculation of available space for multi-NUMA virtual machines

Nautilus: An Auto-Scheduling Tensor Compiler for Efficient Tiled GPU Kernels

Fast Concurrent Primitives Despite Contention

MARS: Efficient, Adaptive Co-Scheduling for Heterogeneous Agentic Systems

ReRec: Reasoning-Augmented LLM-based Recommendation Assistant via Reinforcement Fine-tuning

CBM-Dual: A 65-nm Fully Connected Chaotic Boltzmann Machine Processor for Dual Function Simulated Annealing and Reservoir Computing

PHAROS: Pipelined Heterogeneous Accelerators for Real-time Safety-critical Systems With Deadline Compliance

GENSERVE: Efficient Co-Serving of Heterogeneous Diffusion Model Workloads

Perils of Parallelism: Transaction Fee Mechanisms under Execution Uncertainty

Fusion and Alignment Enhancement with Large Language Models for Tail-item Sequential Recommendation

TokenDance: Scaling Multi-Agent LLM Serving via Collective KV Cache Sharing

Browse by Category

Research Type

Publish Your Research