Expertini Research Research

Browse Research Papers

16,353+ open-access research outputs.

โœ• Clear
๐Ÿ” memory ๐Ÿ“‚ Computer Science
Showing 16353 results for "memory" in Computer Science
Computer Science Preprint PDF DOI

FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption

Yanting Wang, Chenlong Yin, Ying Chen, Jinyuan Jia ยท 2026

Long-context large language models (LLMs)-for example, Gemini-3.1-Pro and Qwen-3.5-are widely used to empower many real-world applications, such as retrieval-augmented generation, autonomous agents, aโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing

Silvio Martinico, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini ยท 2026

Multivector retrieval models achieve state-of-the-art effectiveness through fine-grained token-level representations, but their deployment incurs substantial computational and memory costs. Current soโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Succinct Graph Representations and Algorithmic Applications

Ahammed Ullah, Alex Pothen ยท 2026

We propose new graph representations that exploit dense local structure to improve time and space simultaneously. Given an undirected graph $G$, we define a dual clique cover (DCC) representation of $โ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Exploring Sparse Matrix Multiplication Kernels on the Cerebras CS-3

Milan Shah, Sheng Di, Michela Becchi ยท 2026

In recent years, novel AI accelerators have emerged as promising alternatives to GPU for AI model training and inference tasks. One such accelerator, the Cerebras CS-3, achieves strong performance on โ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Affinity Tailor: Dynamic Locality-Aware Scheduling at Scale

Jin Xin Ng, Ori Livneh, Richard O'Grady, Josh Don, Peng Ding, Samuel Grossman, Luis Otero, Chris Kennelly, David Lo, Carlos Villavieja ยท 2026

Modern large multicore systems often run multiple workloads that share CPUs under schedulers such as Linux CFS. To keep CPUs busy, these schedulers load-balance runnable work, causing each workload toโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

ZipCCL: Efficient Lossless Data Compression of Communication Collectives for Accelerating LLM Training

Wenxiang Lin, Xinglin Pan, Ruibo Fan, Shaohuai Shi, Xiaowen Chu ยท 2026

Communication has emerged as a critical bottleneck in the distributed training of large language models (LLMs). While numerous approaches have been proposed to reduce communication overhead, the potenโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

AME-PIM: Can Memory be Your Next Tensor Accelerator?

Emanuele Venieri, Simone Manoni, Alberto Florian, Jaehyun Park, Kyomin Sohn, Andrea Bartolini ยท 2026

High Bandwidth Memory with Processing-in-Memory (HBM-PIM) offers an opportunity to reduce data movement by executing computation directly inside memory, but current commercial platforms expose limitedโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

VitaLLM: A Versatile, Ultra-Compact Ternary LLM Accelerator with Dependency-Aware Scheduling

Zi-Wei Lin, Tian-Sheuan Chang ยท 2026

Deploying Large Language Models (LLMs) on resource-constrained edge devices faces critical bottlenecks in memory bandwidth and power consumption. While ternary quantization (e.g., BitNet b1.58) signifโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

RCW-CIM: A Digital CIM-based LLM Accelerator with Read-Compute/Write

Yan-Cheng Guo, Tian-Sheuan Chang, Jian-Wei Su ยท 2026

Digital computing-in-memory (DCIM) has emerged as a promising solution for large language model (LLM) acceleration by minimizing data transfers between external DRAM and on-chip accelerators while maiโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Low-Complexity Run-Length-Limited ISI-Mitigation (RLIM) Codes for Molecular Communication

Melih Sahin, Ozgur B. Akan ยท 2026

Molecular communication suffers from severe inter-symbol interference, which makes constrained coding essential for reliable transmission. Run-length-limited ISI-mitigation codes are attractive becausโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Efficient Training on Multiple Consumer GPUs with RoundPipe

Yibin Luo, Shiwei Gao, Huichuan Zheng, Youyou Lu, Jiwu Shu ยท 2026

Fine-tuning Large Language Models (LLMs) on consumer-grade GPUs is highly cost-effective, yet constrained by limited GPU memory and slow PCIe interconnects. Pipeline parallelism combined with CPU offlโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Adaptive Self-Organization in Anonymous Dynamic Networks

Garrett Parzych, Joshua J. Daymude ยท 2026

We introduce the problem of adaptive self-organization in which the nodes of an anonymous, synchronous dynamic network must distributively change the collective distribution of their responses (or "coโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Revealing NVIDIA Closed-Source Driver Command Streams for CPU-GPU Runtime Behavior Insight

Yuang Yan, Ian Karlin, Ryan Grant ยท 2026

For NVIDIA GPUs, CUDA is the primary interface through which applications orchestrate GPU execution, yet much of the logic that realizes CUDA operations resides in NVIDIA's closed-source userspace driโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

FaaSMoE: A Serverless Framework for Multi-Tenant Mixture-of-Experts Serving

Minghe Wang, Trever Schirmer, Mohammadreza Malekabbasi, David Bermbach ยท 2026

Mixture-of-Experts (MoE) models offer high capacity with efficient inference cost by activating a small subset of expert models per input. However, deploying MoE models requires all experts to reside โ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Exploring the Efficiency of 3D-Stacked AI Chip Architecture for LLM Inference with Voxel

Yiqi Liu, Noelle Crawford, Michael Wang, Jilong Xue, Jian Huang ยท 2026

To overcome the well-known memory bottleneck of AI chips, 3D stacked architectures that employ advanced packaging technology with high-density through-silicon vias (TSVs) pins have proven to be a promโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Sparse-on-Dense: Area and Energy-Efficient Computing of Sparse Neural Networks on Dense Matrix Multiplication Accelerators

Hyunsung Yoon, Sungju Ryu, Jae-Joon Kim ยท 2026

As the size of Deep Neural Networks (DNNs) increases dramatically to achieve high accuracy, the DNNs require a large amount of computations and memory footprint. Pruning, which produces a sparse neuraโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

DUAL-BLADE: Dual-Path NVMe-Direct KV-Cache Offloading for Edge LLM Inference

Bodon Jeong, Hongsu Byun, Youngjae Kim, Weikuan Yu, Kyungkeun Lee, Jihoon Yang, Sungyong Park ยท 2026

The increasing deployment of Large Language Model (LLM) inference on edge AI systems demands efficient execution under tight memory budgets. A key challenge arises from Key-Value (KV) caches, which ofโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

FloatSOM: GPU-Accelerated, Distributed, Topology-Flexible Self-Organizing Maps

Tony Xu, Sarah Klamt, Katherine Turner, Anne Brustle, Felix Marsh-Wakefield, Givanna Putri ยท 2026

GPU-accelerated Self-Organizing Map (SOM) implementations are among the most competitive options for large-scale SOM analysis, but growing dataset sizes increasingly challenge their practical use becaโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Quantamination: Dynamic Quantization Leaks Your Data Across the Batch

Hanna Foerster, Ilia Shumailov, Cheng Zhang, Yiren Zhao, Jamie Hayes, Robert Mullins ยท 2026

Dynamic quantization emerged as a practical approach to increase the utilization and efficiency of the machine learning serving flow. Unlike static quantization, which applies quantization offline, dyโ€ฆ

Read Paper โ†’
Computer Science Preprint PDF DOI

Compressing ACAS-Xu Lookup Tables with Binary Decision Diagrams

Martin Boniol (ISAE-SUPAERO), Julien Brunel, Jean-Baptiste Chaudron (ISAE-SUPAERO), Christophe Garion (ISAE-SUPAERO), Xavier Thirioux (ISAE-SUPAERO) ยท 2026

The Airborne Collision Avoidance System Xu (ACAS-Xu) relies on large certified Look-Up Tables (LUTs) that encode the exact decision logic used in operation. Neural-network-based approximations have beโ€ฆ

Read Paper โ†’
Page 1 of 818 Next โ†’