Memory in Computer Science — Research Repository

Computer Science Preprint PDF DOI

Hierarchical Long-Term Semantic Memory for LinkedIn's Hiring Agent

Zhentao Xu, Shangjing Zhang, Emir Poyraz, Yvonne Li, Ye Jin, Xie Lu, Xiaoyang Gu, Karthik Ramgopal, Praveen Kumar Bodigutla, Xiaofeng Wang · 2026

Large Language Model (LLM) agents are increasingly used in real-world products, where personalized and context-aware user interactions are essential. A central enabler of such capabilities is the agen…

Read Paper →

Computer Science Preprint PDF DOI

Application-Aware Twin-in-the-Loop Planning for Federated Split Learning over Wireless Edge Networks

Zihao Ding, Beining Wu, Jun Huang, Shiwen Mao · 2026

We investigate task-success-oriented resource allocation for federated split learning (FSL) at the wireless edge. In this setting, the server must jointly determine bandwidth, transmit power, split-la…

Read Paper →

Computer Science Preprint PDF DOI

AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving

Zhongkai Yu, Haotian Ye, Chenyang Zhou, Ohm Rishabh Venkatachalam, Zaifeng Pan, Zhengding Hu, Junsung Kim, Won Woo Ro, Po-An Tsai, Shuyi Pei, Yangwook Kang, Yufei Ding · 2026

All current LLM serving systems place the GPU at the center, from production-level attention-FFN disaggregation to NVIDIA's Rubin GPU-LPU heterogeneous platform. Even academic PIM/PNM proposals still …

Read Paper →

Computer Science Preprint PDF DOI

DAK: Direct-Access-Enabled GPU Memory Offloading with Optimal Efficiency for LLM Inference

Shouxu Lin, Zhiyuan Guo, Jiaxin Lin · 2026

LLM inference is constrained by GPU memory capacity and bandwidth. Tiered memory architectures mitigate this by allowing the GPU to offload memory to the remote tier. However, existing memory offloadi…

Read Paper →

Computer Science Preprint PDF DOI

Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models

Ajmain Inqiad Alam, Palash Roy, Chanchal K. Roy, Banani Roy, Kevin A. Schneider · 2026

The accelerating adoption of Large Language Models (LLMs) in software engineering (SE) has brought with it a silent crisis: unsustainable computational cost. While these models demonstrate remarkable …

Read Paper →

Computer Science Preprint PDF DOI

NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference

Mingbo Hao, Changwei Yan, Haoyu Cui, Zhihao Yan, Yizhi Ding, Zhangrui Qian, Weiwei Shan · 2026

The rapid growth of LLMs demands high-throughput, memory-capacity-intensive inference on resource-constrained edge devices, where single-batch decoding remains fundamentally memory-bound. Existing out…

Read Paper →

Computer Science Preprint PDF DOI

K-CARE: Knowledge-driven Symmetrical Contextual Anchoring and Analogical Prototype Reasoning for E-commerce Relevance

Chen Yifei, Tian Zhixing, Wang Chenyang, Cheng Ziguang · 2026

This paper targets e-commerce search relevance. While Large Language Models (LLMs) have demonstrated significant potential in this field, they often encounter performance bottlenecks in persistent 'co…

Read Paper →

Computer Science Preprint PDF DOI

SimdQuickHeap: The QuickHeap Reconsidered

Johannes Breitling, Ragnar Groot Koerkamp, Marvin Williams · 2026

Priority queues are data structures that maintain a dynamic collection of elements and allow inserting new elements and removing the smallest element. The most widely known and used priority queue is …

Read Paper →

Computer Science Preprint PDF DOI

Embedded Rust or C Firmware? Lessons from an Industrial Microcontroller Use Case with Ariel OS

Bipin Thapa, Daniele Alfonso, Lorenzo Bini, Licio Mapelli, Kaspar Schleiser, Romain Fouquet, Emmanuel Baccelli · 2026

As Rust gains traction for developing safer systems software, a reality check for the microcontroller hardware segment becomes necessary. How ready is the Rust ecosystem for this segment? Can Rust com…

Read Paper →

Computer Science Preprint PDF DOI

Multibit neural inference in a N-ary crossbar architecture

Anatole Moureaux, Anthony Lopes Temporao, Flavio Abreu Araujo · 2026

In-memory computing (IMC) enables energy-efficient neural network inference by computing analog matrix-vector multiplications (MVM) in memory crossbar arrays. In this work we present a simulation fram…

Read Paper →

Computer Science Preprint PDF DOI

SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents

Mengyao Du, Han Fang, Haokai Ma, Jiahao Chen, Kai Xu, Quanjun Yin, Ee-Chien Chang · 2026

Web agents have emerged as an effective paradigm for automating interactions with complex web environments, yet remain vulnerable to prompt injection attacks that embed malicious instructions into web…

Read Paper →

Computer Science Preprint PDF DOI

CUDA Kernel Optimization and Counter-Free Performance Analysis for Depthwise Convolution in Cloud Environments

Huriyeh Babak, Melanie Schaller · 2026

Efficient GPU execution of convolution operators is governed by memory-access efficiency, on-chip data reuse, and execution mapping rather than arithmetic throughput alone. This paper presents a contr…

Read Paper →

Computer Science Preprint PDF DOI

An Efficient Streaming Algorithm for Approximating Graphlet Distributions

Marco Bressan, T-H. Hubert Chan, Qipeng Kuang, Mauro Sozio · 2026

In recent years, the problem of computing the frequencies of the induced $k$-vertex subgraphs of a graph, or \emph{$k$-graphlets}, has become central. One approach for this problem is to sample $k$-gr…

Read Paper →

Computer Science Preprint PDF DOI

TetrisG-SDK: Efficient Convolutional Layer Mapping with Adaptive Windows and Grouped Convolutions for Fast In-Memory Computing

Ke Dong, Kejie Huang, Tao Luo, Bo Wang · 2026

Shifted-and-Duplicated-Kernel (SDK) mapping has emerged as an effective strategy to accelerate convolutional layers on compute-in-memory (CIM) hardware. However, existing SDK variants (e.g., VWC-SDK) …

Read Paper →

Computer Science Preprint PDF DOI

RecFlash: Fast Recommendation System on In-Storage Computing with Frequency-Based Data Mapping

Jangho Baik, Sunghyun Kim, Gisan Ji, Wonbo Shim, Sungju Ryu · 2026

Recommendation system has gained a large popularity for a variety of personalized suggestion tasks, but the ever-increasing number of user data makes real-time processing of recommendation systems dif…

Read Paper →

Computer Science Preprint PDF DOI

FusionCIM: Accelerating LLM Inference with Fusion-Driven Computing-in-Memory Architecture

Zihao Xuan, Jia Chen, Yewen Li, Wei Xuan, Hegan Chen, Xiao Huo, Fengbin Tu · 2026

In this paper, we propose FusionCIM, an operator-fusion-driven compute-in-memory (CIM) accelerator architecture for efficient and scalable LLM inference, with three key innovations: (1) a hybrid CIM p…

Read Paper →

Computer Science Preprint PDF DOI

Hardware Generation and Exploration of Lookup Table-Based Accelerators for 1.58-bit LLM Inference

Robin Geens, Joran Heldens, Joren Dumoulin, Marian Verhelst · 2026

Ternary weight quantization (e.g., BitNet b1.58) offers a promising path to mitigate the memory bandwidth bottleneck in Large Language Model (LLM) inference. However, conventional compute platforms la…

Read Paper →

Computer Science Preprint PDF DOI

CacheFlow: Efficient LLM Serving with 3D-Parallel KV Cache Restoration

Sean Nian, Jiahao Fang, Qilong Feng, Zhiyu Wu, Fan Lai · 2026

KV cache restoration has emerged as a dominant bottleneck in serving long-context LLM workloads, including multi-turn conversations, retrieval-augmented generation, and agentic pipelines. Existing app…

Read Paper →

Computer Science Preprint PDF DOI

AFA: Identity-Aware Memory for Preventing Persona Confusion in Multi-User Dialogue

Mohammad Al-Ratrout, Pavan Uttej Ravva, Shayla Sharmin, Aditya Raikwar, Ju Young Shin, Roghayeh Leila Barmaki · 2026

When multiple people share a single voice assistant, the system conflates their histories: one resident's preferences can leak into another's responses, eroding utility and trust. We call this failure…

Read Paper →

Computer Science Preprint PDF DOI

AgentWard: A Lifecycle Security Architecture for Autonomous AI Agents

Yixiang Zhang, Xinhao Deng, Jiaqing Wu, Yue Xiao, Ke Xu, Qi Li · 2026

Autonomous AI agents extend large language models into full runtime systems that load skills, ingest external content, maintain memory, plan multi-step actions, and invoke privileged tools. In such sy…

Read Paper →

Browse Research Papers

Hierarchical Long-Term Semantic Memory for LinkedIn's Hiring Agent

Application-Aware Twin-in-the-Loop Planning for Federated Split Learning over Wireless Edge Networks

AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving

DAK: Direct-Access-Enabled GPU Memory Offloading with Optimal Efficiency for LLM Inference

Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models

NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference

K-CARE: Knowledge-driven Symmetrical Contextual Anchoring and Analogical Prototype Reasoning for E-commerce Relevance

SimdQuickHeap: The QuickHeap Reconsidered

Embedded Rust or C Firmware? Lessons from an Industrial Microcontroller Use Case with Ariel OS

Multibit neural inference in a N-ary crossbar architecture

SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents

CUDA Kernel Optimization and Counter-Free Performance Analysis for Depthwise Convolution in Cloud Environments

An Efficient Streaming Algorithm for Approximating Graphlet Distributions

TetrisG-SDK: Efficient Convolutional Layer Mapping with Adaptive Windows and Grouped Convolutions for Fast In-Memory Computing

RecFlash: Fast Recommendation System on In-Storage Computing with Frequency-Based Data Mapping

FusionCIM: Accelerating LLM Inference with Fusion-Driven Computing-in-Memory Architecture

Hardware Generation and Exploration of Lookup Table-Based Accelerators for 1.58-bit LLM Inference

CacheFlow: Efficient LLM Serving with 3D-Parallel KV Cache Restoration

AFA: Identity-Aware Memory for Preventing Persona Confusion in Multi-User Dialogue

AgentWard: A Lifecycle Security Architecture for Autonomous AI Agents

Browse by Category

Research Type

Publish Your Research