Expertini Research Research

Browse Research Papers

891+ open-access research outputs.

โœ• Clear
๐Ÿ” durham) ๐Ÿ“‚ Computer Science
Showing 891 results for "durham)" in Computer Science
Computer Science Preprint PDF DOI

RCW-CIM: A Digital CIM-based LLM Accelerator with Read-Compute/Write

Yan-Cheng Guo, Tian-Sheuan Chang, Jian-Wei Su ยท 2026

Digital computing-in-memory (DCIM) has emerged as a promising solution for large language model (LLM) acceleration by minimizing data transfers between external DRAM and on-chip accelerators while maiโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Exploring the Efficiency of 3D-Stacked AI Chip Architecture for LLM Inference with Voxel

Yiqi Liu, Noelle Crawford, Michael Wang, Jilong Xue, Jian Huang ยท 2026

To overcome the well-known memory bottleneck of AI chips, 3D stacked architectures that employ advanced packaging technology with high-density through-silicon vias (TSVs) pins have proven to be a promโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

What Is the Cost of Energy Monitoring? An Empirical Study on the Overhead of RAPL-Based Tools

Jeremy Diamond, Vincenzo Stoico ยท 2026

The Running Average Power Limit (RAPL) interface is widely used to estimate software energy consumption via CPU and DRAM counters, but tool design differences and high-frequency polling can introduce โ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference

Mingbo Hao, Changwei Yan, Haoyu Cui, Zhihao Yan, Yizhi Ding, Zhangrui Qian, Weiwei Shan ยท 2026

The rapid growth of LLMs demands high-throughput, memory-capacity-intensive inference on resource-constrained edge devices, where single-batch decoding remains fundamentally memory-bound. Existing outโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

AHASD: Asynchronous Heterogeneous Architecture for LLM Adaptive Drafting Speculative Decoding on Mobile Devices

Ma Zirui, Fan Zhihua, Li Wenxing, Wu Haibin, Zhang Fulin, Ye Xiaochun, Li Wenming ยท 2026

Speculative decoding enhances the inference efficiency of large language models (LLMs) by generating drafts using a small draft language model (DLM) and verifying them in batches with a large target lโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

RowHammer Vulnerability Counter (RVC): Redefining RowHammer Detection with Victim-Centric Tracking

Lavi Jain, Venkata Kalyan Tavva ยท 2026

The Rowhammer vulnerability poses an increasing challenge with newer generations of DRAM and aggressive technology scaling. Existing mitigation techniques, such as Graphene, Twice, and Hydra, primarilโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Tessera: Secure, Near-Line-Rate Weight Streaming for UMA Edge Accelerators

Animan Naskar ยท 2026

Deploying proprietary Deep Neural Networks (DNNs) on commodity edge devices demands hardware-backed Digital Rights Management (DRM) capable of withstanding both software-level and physical adversariesโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

PVAC: A RowHammer Mitigation Architecture Exploiting Per-victim-row Counting

Jumin Kim, Seungmin Baek, Hwayong Nam, Minbok Wi, Nam Sung Kim, Jung Ho Ahn ยท 2026

As DRAM scaling exacerbates RowHammer, DDR5 introduces per-row activation counting (PRAC) to track aggressor activity. However, PRAC indiscriminately increments counters on every activation -- includiโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Efficient Page Migration in Hybrid Memory Systems

Upasna, Venkata Kalyan Tavva ยท 2026

Heterogeneous Memory Architecture (HMA) aims to optimize memory usage by leveraging a combination of memory types, such as high-bandwidth memory (HBM), commodity DRAM, and non-volatile memory (NVM), wโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

DPC: A Distributed Page Cache over CXL

Shai Bergman, Zhe Yang, Julien Eudine, Giorgio Negro, Onur Mutlu, Arash Tavakkol, Ji Zhang ยท 2026

Modern distributed file systems rely on uncoordinated, per node page caches that replicate hot data locally across the cluster. While ensuring fast local access, this architecture underutilizes aggregโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Allow Me Into Your Dream: A Handshake-and-Pull Protocol for Sharing Mixed Realities in Spontaneous Encounters

Botao Amber Hu, Yilan Elan Tao, Bernhard Riecke, Yue Li ยท 2026

Mixed reality systems support shared anchors and co-located interaction, yet they lack a socially legible protocol for entering another person's mixed reality in public settings. We frame this as a prโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Predictive Multi-Tier Memory Management for KV Cache in Large-Scale GPU Inference

Sanjeev Rao Ganjihal ยท 2026

Key-value (KV) cache memory management is the primary bottleneck limiting throughput and cost-efficiency in large-scale GPU inference serving. Current systems suffer from three compounding inefficiencโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Bit-Flip Vulnerability of Shared KV-Cache Blocks in LLM Serving Systems

Yuji Yamamoto, Satoshi Matsuura ยท 2026

Rowhammer on GPU DRAM has enabled adversarial bit flips in model weights; shared KV-cache blocks in LLM serving systems present an analogous but previously unexamined target. In vLLM's Prefix Caching,โ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

PoSME: Proof of Sequential Memory Execution via Latency-Bound Pointer Chasing with Causal Hash Binding

David L. Condrey ยท 2026

We introduce PoSME (Proof of Sequential Memory Execution), a cryptographic primitive that enforces sustained sequential computation via latency-bound pointer chasing over a mutable arena. Each step reโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Accelerating CRONet on AMD Versal AIE-ML Engines

Kaustubh Mhatre, Vedant Tewari, Aditya Ray, Farhan Khan, Ridwan Olabiyi, Ashif Iquebal, Aman Arora ยท 2026

Topology optimization is a computational method used to determine the optimal material distribution within a prescribed design domain, aiming to minimize structural weight while satisfying load and boโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Parallel R-tree-based Spatial Query Processing on a Commercial Processing-in-Memory System

Tasmia Jannat, Michael Gowanlock, Satish Puri ยท 2026

The growing volume of data in scientific domains has made spatial query processing increasingly challenging due to high data transfer costs across the memory hierarchy and limited memory bandwidth. Toโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

EdgeCIM: A Hardware-Software Co-Design for CIM-Based Acceleration of Small Language Models

Jinane Bazzi, Mariam Rakka, Fadi Kurdahi, Mohammed E. Fouda, Ahmed Eltawil ยท 2026

The growing demand for deploying Small Language Models (SLMs) on edge devices, including laptops, smartphones, and embedded platforms, has exposed fundamental inefficiencies in existing accelerators. โ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

A Full-Stack Performance Evaluation Infrastructure for 3D-DRAM-based LLM Accelerators

Cong Li, Chenhao Xue, Yi Ren, Xiping Dong, Yu Cheng, Yinbo Hu, Fujun Bai, Yixin Guo, Xiping Jiang, Qiang Wu, Zhi Yang, Zhe Cheng, Yuan Xie, Guangyu Sun ยท 2026

Large language models (LLMs) exhibit memory-intensive behavior during decoding, making it a key bottleneck in LLM inference. To accelerate decoding execution, hybrid-bonding-based 3D-DRAM has been adoโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

SHIELD: A Segmented Hierarchical Memory Architecture for Energy-Efficient LLM Inference on Edge NPUs

Jintao Zhang, Xuanyao Fong ยท 2026

Large Language Model (LLM) inference on edge Neural Processing Units (NPUs) is fundamentally constrained by limited on-chip memory capacity. Although high-density embedded DRAM (eDRAM) is attractive fโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

A comparative study on power delivery aspects of compute-in/near-memory approaches using DRAM

Siddhartha Raman Sundara Raman, Siyuan Ma, Lizy Kurian John ยท 2026

Compute-in-memory (PIM) mitigates the memory wall by performing computation within memory, reducing data movement and improving energy efficiency. DRAM-based PIM is particularly attractive due to its โ€ฆ

Read Paper โ†’
Page 1 of 45 Next โ†’