Durham) · Computer Science · Preprint — Research Repository

Computer Science Preprint PDF DOI

RCW-CIM: A Digital CIM-based LLM Accelerator with Read-Compute/Write

Yan-Cheng Guo, Tian-Sheuan Chang, Jian-Wei Su · 2026

Digital computing-in-memory (DCIM) has emerged as a promising solution for large language model (LLM) acceleration by minimizing data transfers between external DRAM and on-chip accelerators while mai…

Read Paper →

Computer Science Preprint PDF DOI

Exploring the Efficiency of 3D-Stacked AI Chip Architecture for LLM Inference with Voxel

Yiqi Liu, Noelle Crawford, Michael Wang, Jilong Xue, Jian Huang · 2026

To overcome the well-known memory bottleneck of AI chips, 3D stacked architectures that employ advanced packaging technology with high-density through-silicon vias (TSVs) pins have proven to be a prom…

Read Paper →

Computer Science Preprint PDF DOI

What Is the Cost of Energy Monitoring? An Empirical Study on the Overhead of RAPL-Based Tools

Jeremy Diamond, Vincenzo Stoico · 2026

The Running Average Power Limit (RAPL) interface is widely used to estimate software energy consumption via CPU and DRAM counters, but tool design differences and high-frequency polling can introduce …

Read Paper →

Computer Science Preprint PDF DOI

NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference

Mingbo Hao, Changwei Yan, Haoyu Cui, Zhihao Yan, Yizhi Ding, Zhangrui Qian, Weiwei Shan · 2026

The rapid growth of LLMs demands high-throughput, memory-capacity-intensive inference on resource-constrained edge devices, where single-batch decoding remains fundamentally memory-bound. Existing out…

Read Paper →

Computer Science Preprint PDF DOI

AHASD: Asynchronous Heterogeneous Architecture for LLM Adaptive Drafting Speculative Decoding on Mobile Devices

Ma Zirui, Fan Zhihua, Li Wenxing, Wu Haibin, Zhang Fulin, Ye Xiaochun, Li Wenming · 2026

Speculative decoding enhances the inference efficiency of large language models (LLMs) by generating drafts using a small draft language model (DLM) and verifying them in batches with a large target l…

Read Paper →

Computer Science Preprint PDF DOI

RowHammer Vulnerability Counter (RVC): Redefining RowHammer Detection with Victim-Centric Tracking

Lavi Jain, Venkata Kalyan Tavva · 2026

The Rowhammer vulnerability poses an increasing challenge with newer generations of DRAM and aggressive technology scaling. Existing mitigation techniques, such as Graphene, Twice, and Hydra, primaril…

Read Paper →

Computer Science Preprint PDF DOI

Tessera: Secure, Near-Line-Rate Weight Streaming for UMA Edge Accelerators

Animan Naskar · 2026

Deploying proprietary Deep Neural Networks (DNNs) on commodity edge devices demands hardware-backed Digital Rights Management (DRM) capable of withstanding both software-level and physical adversaries…

Read Paper →

Computer Science Preprint PDF DOI

PVAC: A RowHammer Mitigation Architecture Exploiting Per-victim-row Counting

Jumin Kim, Seungmin Baek, Hwayong Nam, Minbok Wi, Nam Sung Kim, Jung Ho Ahn · 2026

As DRAM scaling exacerbates RowHammer, DDR5 introduces per-row activation counting (PRAC) to track aggressor activity. However, PRAC indiscriminately increments counters on every activation -- includi…

Read Paper →

Computer Science Preprint PDF DOI

Efficient Page Migration in Hybrid Memory Systems

Upasna, Venkata Kalyan Tavva · 2026

Heterogeneous Memory Architecture (HMA) aims to optimize memory usage by leveraging a combination of memory types, such as high-bandwidth memory (HBM), commodity DRAM, and non-volatile memory (NVM), w…

Read Paper →

Computer Science Preprint PDF DOI

DPC: A Distributed Page Cache over CXL

Shai Bergman, Zhe Yang, Julien Eudine, Giorgio Negro, Onur Mutlu, Arash Tavakkol, Ji Zhang · 2026

Modern distributed file systems rely on uncoordinated, per node page caches that replicate hot data locally across the cluster. While ensuring fast local access, this architecture underutilizes aggreg…

Read Paper →

Computer Science Preprint PDF DOI

Allow Me Into Your Dream: A Handshake-and-Pull Protocol for Sharing Mixed Realities in Spontaneous Encounters

Botao Amber Hu, Yilan Elan Tao, Bernhard Riecke, Yue Li · 2026

Mixed reality systems support shared anchors and co-located interaction, yet they lack a socially legible protocol for entering another person's mixed reality in public settings. We frame this as a pr…

Read Paper →

Computer Science Preprint PDF DOI

Predictive Multi-Tier Memory Management for KV Cache in Large-Scale GPU Inference

Sanjeev Rao Ganjihal · 2026

Key-value (KV) cache memory management is the primary bottleneck limiting throughput and cost-efficiency in large-scale GPU inference serving. Current systems suffer from three compounding inefficienc…

Read Paper →

Computer Science Preprint PDF DOI

Bit-Flip Vulnerability of Shared KV-Cache Blocks in LLM Serving Systems

Yuji Yamamoto, Satoshi Matsuura · 2026

Rowhammer on GPU DRAM has enabled adversarial bit flips in model weights; shared KV-cache blocks in LLM serving systems present an analogous but previously unexamined target. In vLLM's Prefix Caching,…

Read Paper →

Computer Science Preprint PDF DOI

PoSME: Proof of Sequential Memory Execution via Latency-Bound Pointer Chasing with Causal Hash Binding

David L. Condrey · 2026

We introduce PoSME (Proof of Sequential Memory Execution), a cryptographic primitive that enforces sustained sequential computation via latency-bound pointer chasing over a mutable arena. Each step re…

Read Paper →

Computer Science Preprint PDF DOI

Accelerating CRONet on AMD Versal AIE-ML Engines

Kaustubh Mhatre, Vedant Tewari, Aditya Ray, Farhan Khan, Ridwan Olabiyi, Ashif Iquebal, Aman Arora · 2026

Topology optimization is a computational method used to determine the optimal material distribution within a prescribed design domain, aiming to minimize structural weight while satisfying load and bo…

Read Paper →

Computer Science Preprint PDF DOI

Parallel R-tree-based Spatial Query Processing on a Commercial Processing-in-Memory System

Tasmia Jannat, Michael Gowanlock, Satish Puri · 2026

The growing volume of data in scientific domains has made spatial query processing increasingly challenging due to high data transfer costs across the memory hierarchy and limited memory bandwidth. To…

Read Paper →

Computer Science Preprint PDF DOI

EdgeCIM: A Hardware-Software Co-Design for CIM-Based Acceleration of Small Language Models

Jinane Bazzi, Mariam Rakka, Fadi Kurdahi, Mohammed E. Fouda, Ahmed Eltawil · 2026

The growing demand for deploying Small Language Models (SLMs) on edge devices, including laptops, smartphones, and embedded platforms, has exposed fundamental inefficiencies in existing accelerators. …

Read Paper →

Computer Science Preprint PDF DOI

A Full-Stack Performance Evaluation Infrastructure for 3D-DRAM-based LLM Accelerators

Cong Li, Chenhao Xue, Yi Ren, Xiping Dong, Yu Cheng, Yinbo Hu, Fujun Bai, Yixin Guo, Xiping Jiang, Qiang Wu, Zhi Yang, Zhe Cheng, Yuan Xie, Guangyu Sun · 2026

Large language models (LLMs) exhibit memory-intensive behavior during decoding, making it a key bottleneck in LLM inference. To accelerate decoding execution, hybrid-bonding-based 3D-DRAM has been ado…

Read Paper →

Computer Science Preprint PDF DOI

SHIELD: A Segmented Hierarchical Memory Architecture for Energy-Efficient LLM Inference on Edge NPUs

Jintao Zhang, Xuanyao Fong · 2026

Large Language Model (LLM) inference on edge Neural Processing Units (NPUs) is fundamentally constrained by limited on-chip memory capacity. Although high-density embedded DRAM (eDRAM) is attractive f…

Read Paper →

Computer Science Preprint PDF DOI

A comparative study on power delivery aspects of compute-in/near-memory approaches using DRAM

Siddhartha Raman Sundara Raman, Siyuan Ma, Lizy Kurian John · 2026

Compute-in-memory (PIM) mitigates the memory wall by performing computation within memory, reducing data movement and improving energy efficiency. DRAM-based PIM is particularly attractive due to its …

Read Paper →

Browse Research Papers

RCW-CIM: A Digital CIM-based LLM Accelerator with Read-Compute/Write

Exploring the Efficiency of 3D-Stacked AI Chip Architecture for LLM Inference with Voxel

What Is the Cost of Energy Monitoring? An Empirical Study on the Overhead of RAPL-Based Tools

NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference

AHASD: Asynchronous Heterogeneous Architecture for LLM Adaptive Drafting Speculative Decoding on Mobile Devices

RowHammer Vulnerability Counter (RVC): Redefining RowHammer Detection with Victim-Centric Tracking

Tessera: Secure, Near-Line-Rate Weight Streaming for UMA Edge Accelerators

PVAC: A RowHammer Mitigation Architecture Exploiting Per-victim-row Counting

Efficient Page Migration in Hybrid Memory Systems

DPC: A Distributed Page Cache over CXL

Allow Me Into Your Dream: A Handshake-and-Pull Protocol for Sharing Mixed Realities in Spontaneous Encounters

Predictive Multi-Tier Memory Management for KV Cache in Large-Scale GPU Inference

Bit-Flip Vulnerability of Shared KV-Cache Blocks in LLM Serving Systems

PoSME: Proof of Sequential Memory Execution via Latency-Bound Pointer Chasing with Causal Hash Binding

Accelerating CRONet on AMD Versal AIE-ML Engines

Parallel R-tree-based Spatial Query Processing on a Commercial Processing-in-Memory System

EdgeCIM: A Hardware-Software Co-Design for CIM-Based Acceleration of Small Language Models

A Full-Stack Performance Evaluation Infrastructure for 3D-DRAM-based LLM Accelerators

SHIELD: A Segmented Hierarchical Memory Architecture for Energy-Efficient LLM Inference on Edge NPUs

A comparative study on power delivery aspects of compute-in/near-memory approaches using DRAM

Browse by Category

Research Type

Publish Your Research