Expertini Research Research

Browse Research Papers

16,353+ open-access research outputs.

โœ• Clear
๐Ÿ” memory ๐Ÿ“‚ Computer Science
Showing 16353 results for "memory" in Computer Science
Computer Science Preprint PDF DOI

Hierarchical Long-Term Semantic Memory for LinkedIn's Hiring Agent

Zhentao Xu, Shangjing Zhang, Emir Poyraz, Yvonne Li, Ye Jin, Xie Lu, Xiaoyang Gu, Karthik Ramgopal, Praveen Kumar Bodigutla, Xiaofeng Wang ยท 2026

Large Language Model (LLM) agents are increasingly used in real-world products, where personalized and context-aware user interactions are essential. A central enabler of such capabilities is the agenโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Application-Aware Twin-in-the-Loop Planning for Federated Split Learning over Wireless Edge Networks

Zihao Ding, Beining Wu, Jun Huang, Shiwen Mao ยท 2026

We investigate task-success-oriented resource allocation for federated split learning (FSL) at the wireless edge. In this setting, the server must jointly determine bandwidth, transmit power, split-laโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving

Zhongkai Yu, Haotian Ye, Chenyang Zhou, Ohm Rishabh Venkatachalam, Zaifeng Pan, Zhengding Hu, Junsung Kim, Won Woo Ro, Po-An Tsai, Shuyi Pei, Yangwook Kang, Yufei Ding ยท 2026

All current LLM serving systems place the GPU at the center, from production-level attention-FFN disaggregation to NVIDIA's Rubin GPU-LPU heterogeneous platform. Even academic PIM/PNM proposals still โ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

DAK: Direct-Access-Enabled GPU Memory Offloading with Optimal Efficiency for LLM Inference

Shouxu Lin, Zhiyuan Guo, Jiaxin Lin ยท 2026

LLM inference is constrained by GPU memory capacity and bandwidth. Tiered memory architectures mitigate this by allowing the GPU to offload memory to the remote tier. However, existing memory offloadiโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models

Ajmain Inqiad Alam, Palash Roy, Chanchal K. Roy, Banani Roy, Kevin A. Schneider ยท 2026

The accelerating adoption of Large Language Models (LLMs) in software engineering (SE) has brought with it a silent crisis: unsustainable computational cost. While these models demonstrate remarkable โ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

NVLLM: A 3D NAND-Centric Architecture Enabling Edge on-Device LLM Inference

Mingbo Hao, Changwei Yan, Haoyu Cui, Zhihao Yan, Yizhi Ding, Zhangrui Qian, Weiwei Shan ยท 2026

The rapid growth of LLMs demands high-throughput, memory-capacity-intensive inference on resource-constrained edge devices, where single-batch decoding remains fundamentally memory-bound. Existing outโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

K-CARE: Knowledge-driven Symmetrical Contextual Anchoring and Analogical Prototype Reasoning for E-commerce Relevance

Chen Yifei, Tian Zhixing, Wang Chenyang, Cheng Ziguang ยท 2026

This paper targets e-commerce search relevance. While Large Language Models (LLMs) have demonstrated significant potential in this field, they often encounter performance bottlenecks in persistent 'coโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

SimdQuickHeap: The QuickHeap Reconsidered

Johannes Breitling, Ragnar Groot Koerkamp, Marvin Williams ยท 2026

Priority queues are data structures that maintain a dynamic collection of elements and allow inserting new elements and removing the smallest element. The most widely known and used priority queue is โ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Embedded Rust or C Firmware? Lessons from an Industrial Microcontroller Use Case with Ariel OS

Bipin Thapa, Daniele Alfonso, Lorenzo Bini, Licio Mapelli, Kaspar Schleiser, Romain Fouquet, Emmanuel Baccelli ยท 2026

As Rust gains traction for developing safer systems software, a reality check for the microcontroller hardware segment becomes necessary. How ready is the Rust ecosystem for this segment? Can Rust comโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Multibit neural inference in a N-ary crossbar architecture

Anatole Moureaux, Anthony Lopes Temporao, Flavio Abreu Araujo ยท 2026

In-memory computing (IMC) enables energy-efficient neural network inference by computing analog matrix-vector multiplications (MVM) in memory crossbar arrays. In this work we present a simulation framโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents

Mengyao Du, Han Fang, Haokai Ma, Jiahao Chen, Kai Xu, Quanjun Yin, Ee-Chien Chang ยท 2026

Web agents have emerged as an effective paradigm for automating interactions with complex web environments, yet remain vulnerable to prompt injection attacks that embed malicious instructions into webโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

CUDA Kernel Optimization and Counter-Free Performance Analysis for Depthwise Convolution in Cloud Environments

Huriyeh Babak, Melanie Schaller ยท 2026

Efficient GPU execution of convolution operators is governed by memory-access efficiency, on-chip data reuse, and execution mapping rather than arithmetic throughput alone. This paper presents a contrโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

An Efficient Streaming Algorithm for Approximating Graphlet Distributions

Marco Bressan, T-H. Hubert Chan, Qipeng Kuang, Mauro Sozio ยท 2026

In recent years, the problem of computing the frequencies of the induced $k$-vertex subgraphs of a graph, or \emph{$k$-graphlets}, has become central. One approach for this problem is to sample $k$-grโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

TetrisG-SDK: Efficient Convolutional Layer Mapping with Adaptive Windows and Grouped Convolutions for Fast In-Memory Computing

Ke Dong, Kejie Huang, Tao Luo, Bo Wang ยท 2026

Shifted-and-Duplicated-Kernel (SDK) mapping has emerged as an effective strategy to accelerate convolutional layers on compute-in-memory (CIM) hardware. However, existing SDK variants (e.g., VWC-SDK) โ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

RecFlash: Fast Recommendation System on In-Storage Computing with Frequency-Based Data Mapping

Jangho Baik, Sunghyun Kim, Gisan Ji, Wonbo Shim, Sungju Ryu ยท 2026

Recommendation system has gained a large popularity for a variety of personalized suggestion tasks, but the ever-increasing number of user data makes real-time processing of recommendation systems difโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

FusionCIM: Accelerating LLM Inference with Fusion-Driven Computing-in-Memory Architecture

Zihao Xuan, Jia Chen, Yewen Li, Wei Xuan, Hegan Chen, Xiao Huo, Fengbin Tu ยท 2026

In this paper, we propose FusionCIM, an operator-fusion-driven compute-in-memory (CIM) accelerator architecture for efficient and scalable LLM inference, with three key innovations: (1) a hybrid CIM pโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Hardware Generation and Exploration of Lookup Table-Based Accelerators for 1.58-bit LLM Inference

Robin Geens, Joran Heldens, Joren Dumoulin, Marian Verhelst ยท 2026

Ternary weight quantization (e.g., BitNet b1.58) offers a promising path to mitigate the memory bandwidth bottleneck in Large Language Model (LLM) inference. However, conventional compute platforms laโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

CacheFlow: Efficient LLM Serving with 3D-Parallel KV Cache Restoration

Sean Nian, Jiahao Fang, Qilong Feng, Zhiyu Wu, Fan Lai ยท 2026

KV cache restoration has emerged as a dominant bottleneck in serving long-context LLM workloads, including multi-turn conversations, retrieval-augmented generation, and agentic pipelines. Existing appโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

AFA: Identity-Aware Memory for Preventing Persona Confusion in Multi-User Dialogue

Mohammad Al-Ratrout, Pavan Uttej Ravva, Shayla Sharmin, Aditya Raikwar, Ju Young Shin, Roghayeh Leila Barmaki ยท 2026

When multiple people share a single voice assistant, the system conflates their histories: one resident's preferences can leak into another's responses, eroding utility and trust. We call this failureโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

AgentWard: A Lifecycle Security Architecture for Autonomous AI Agents

Yixiang Zhang, Xinhao Deng, Jiaqing Wu, Yue Xiao, Ke Xu, Qi Li ยท 2026

Autonomous AI agents extend large language models into full runtime systems that load skills, ingest external content, maintain memory, plan multi-step actions, and invoke privileged tools. In such syโ€ฆ

Read Paper โ†’
โ† Prev Page 2 of 818 Next โ†’