Statistics in Computer Science — Research Repository

Computer Science Preprint PDF DOI

Index-Assisted Stratified Sampling for Online Aggregation

Yunnan Yu, Zhuoyue Zhao · 2026

Ad-hoc queries over frequently updated data in a flat schema are common in real-time data analysis applications and often require very low latency. Online aggregation can achieve so by providing appro…

Read Paper →

Computer Science Preprint PDF DOI

Perfectly Private Over-the-Air Computation

Shudi Weng, Ming Xiao, Mikael Skoglund · 2026

This paper studies a key research question: how to achieve perfect privacy in over-the-air computation (AirComp)? The problem is particularly intriguing due to a dilemma. Real-field operations can ens…

Read Paper →

Computer Science Preprint PDF DOI

NeuroRing: Scaling Spiking Neural Networks via Multi-FPGA Bidirectional Ring Topologies and Stream-Dataflow Architectures

Muhammad Ihsan Al Hafiz, Artur Podobas · 2026

Spiking neural networks (SNNs) are a promising paradigm for energy-efficient event-driven computation, but large-scale SNN execution remains challenging because sparse spike communication and synchron…

Read Paper →

Computer Science Preprint PDF DOI

ZipCCL: Efficient Lossless Data Compression of Communication Collectives for Accelerating LLM Training

Wenxiang Lin, Xinglin Pan, Ruibo Fan, Shaohuai Shi, Xiaowen Chu · 2026

Communication has emerged as a critical bottleneck in the distributed training of large language models (LLMs). While numerous approaches have been proposed to reduce communication overhead, the poten…

Read Paper →

Computer Science Preprint PDF DOI

Secure Cross-Silo Synthetic Genomic Data Generation

Daniil Filienko, Martine De Cock, Sikha Pentyala · 2026

Access to genomic data is highly regulated due to its sensitive nature. While safeguards are essential, cumbersome data access processes pose a significant barrier to the development of AI methods for…

Read Paper →

Computer Science Preprint PDF DOI

The Likelihood Ratio Wall: Structural Limits on Accurate Risk Assessment for Rare Violence

Marco Pollanen · 2026

Pretrial risk assessment tools are used on over one million U.S. defendants each year, yet their use for predicting rare violent re-offense faces a basic statistical barrier. We derive a universal pre…

Read Paper →

Computer Science Preprint PDF DOI

BLINC: Context-Specific Causal Learning for Automated RAN Configuration

Reshma Prasad, Michele Polese, Tommaso Melodia · 2026

Radio Access Network (RAN) configuration has traditionally required significant manual effort due to indirect causal dependencies between observable Key Performance Indicators (KPIs), and context-depe…

Read Paper →

Computer Science Preprint PDF DOI

MISES: Minimal Information Sufficiency for Effective Service

Joss Armstrong · 2026

Category-based coordination mechanisms allocate resources by mapping a declared service category to a fixed resource profile, without observing individual demand types. We establish three results for …

Read Paper →

Computer Science Preprint PDF DOI

A Sufficient-Statistic Reduction of the Information Bottleneck to a Low-Dimensional Problem

Joss Armstrong · 2026

We show that if the conditional distribution p(C | T) factors through a sufficient statistic {\phi}(T), then the Information Bottleneck (IB) problem for (T, C) is exactly equivalent to the IB problem …

Read Paper →

Computer Science Preprint PDF DOI

COPUS: Co-adaptive Parallelism and Batch Size Selection in Large Language Model Training

Akhmed Sakip, Erland Hilman Fuadi, Omar Sayedelahl, Zonghang Li, Jianshu She, Alham Fikri Aji, Steve Liu, Eric Xing, Qirong Ho · 2026

Training large language models requires jointly configuring two interdependent aspects of the system: the global batch size, which governs statistical efficiency, and the 3D parallelism strategy, whic…

Read Paper →

Computer Science Preprint PDF DOI

Will It Break in Production? Metric-Driven Prediction of Residual Defects in Python Systems

Giuseppe De Rosa, Pietro Liguori · 2026

Python's dynamic nature complicates testing and increases the possibility that some defects evade detection, so an effective fault prediction becomes essential. We examine whether post-release faults …

Read Paper →

Computer Science Preprint PDF DOI

Rank Distribution and Dynamics of Gram Matrices from Binary m-Sequences with Applications to LCD Codes

Hengfeng Liu, Chunming Tang, Cuiling Fan, Zhengchun Zhou · 2026

The Gram matrix is a classical object formed from the pairwise inner products of a collection of vectors, with fundamental roles in functional analysis, statistics, combinatorics, and coding theory. I…

Read Paper →

Computer Science Preprint PDF DOI

Recurrence-Based Nonlinear Vocal Dynamics as Digital Biomarkers for Depression Detection from Conversational Speech

Himadri S Samanta · 2026

Digital biomarkers for depression have largely relied on static acoustic descriptors, pooled summary statistics, or conventional machine learning representations. Such approaches may miss nonlinear te…

Read Paper →

Computer Science Preprint PDF DOI

Calibrated Persistent Homology Tests for High-dimensional Collapse Detection

Alexander Kalinowski · 2026

We study detection of collapse in high-dimensional point clouds, where mass concentrates near a lower-dimensional set relative to a non-collapsed geometry. We propose persistent homology-based test st…

Read Paper →

Computer Science Preprint PDF DOI

Does social identity matter in software engineering? Assessing the case of research software engineers

Chukwudi Uwasomba, Tamara Lopez, Melanie Langer, Helen Sharp, Michel Wermelinger, Caroline Jay, Mark Levine, Bashar Nuseibeh · 2026

Social identity is a concept from psychology that refers to the part of an individual's identity that derives from their group membership(s). In this paper, we explore social identity in members of th…

Read Paper →

Computer Science Preprint PDF DOI

Using Large Language Models for Black-Box Testing of FMU-Based Simulations

Abdullah Mughees, Gaadha Sudheerbabu, Tanwir Ahmad, Dragos Truscan, Mikael Manng{aa}rd, Kristian Klemets · 2026

We propose a human in the loop approach for black-box testing of Functional Mock-up Units (FMUs) using Large Language Models (LLMs). The goal is to reduce the manual effort in defining test scenarios …

Read Paper →

Computer Science Preprint PDF DOI

The Surprising Universality of LLM Outputs: A Real-Time Verification Primitive

Alex Bogdan, Adrian de Valois-Franklin · 2026

We report a striking statistical regularity in frontier LLM outputs that enables a CPU-only scoring primitive running at 2.6 microseconds per token, with estimated latency up to 100,000$\times$ (five …

Read Paper →

Computer Science Preprint PDF DOI

Generating Synthetic Citation Networks with Communities

{L}ukasz Brzozowski, Marek Gagolewski, Grzegorz Siudem · 2026

Generating realistic synthetic citation, patent, or component dependency networks is essential for benchmarking community detection, graph visualisation, and network data mining algorithms. We present…

Read Paper →

Computer Science Preprint PDF DOI

Stop Using the Wilcoxon Test: Myth, Misconception and Misuse in IR Research

Julian Urbano · 2026

In benchmarking of Information Retrieval systems, the Wilcoxon signed-rank test is often treated as a safer alternative to the t-test. This belief is fueled by textbooks and recommendations that portr…

Read Paper →

Computer Science Preprint PDF DOI

Feature Anchors for Time-Series Sensor-Based Human Activity Recognition

Ruijie Yao, Chenhang Li, Danyang Zhuo, Tingjun Chen, Xiaoyue Ni · 2026

Wearable Human Activity Recognition (HAR) still lacks a representation that is both explicit and adaptable. Handcrafted time-series features (TSFs) capture meaningful motion statistics and remain comp…

Read Paper →

Browse Research Papers

Index-Assisted Stratified Sampling for Online Aggregation

Perfectly Private Over-the-Air Computation

NeuroRing: Scaling Spiking Neural Networks via Multi-FPGA Bidirectional Ring Topologies and Stream-Dataflow Architectures

ZipCCL: Efficient Lossless Data Compression of Communication Collectives for Accelerating LLM Training

Secure Cross-Silo Synthetic Genomic Data Generation

The Likelihood Ratio Wall: Structural Limits on Accurate Risk Assessment for Rare Violence

BLINC: Context-Specific Causal Learning for Automated RAN Configuration

MISES: Minimal Information Sufficiency for Effective Service

A Sufficient-Statistic Reduction of the Information Bottleneck to a Low-Dimensional Problem

COPUS: Co-adaptive Parallelism and Batch Size Selection in Large Language Model Training

Will It Break in Production? Metric-Driven Prediction of Residual Defects in Python Systems

Rank Distribution and Dynamics of Gram Matrices from Binary m-Sequences with Applications to LCD Codes

Recurrence-Based Nonlinear Vocal Dynamics as Digital Biomarkers for Depression Detection from Conversational Speech

Calibrated Persistent Homology Tests for High-dimensional Collapse Detection

Does social identity matter in software engineering? Assessing the case of research software engineers

Using Large Language Models for Black-Box Testing of FMU-Based Simulations

The Surprising Universality of LLM Outputs: A Real-Time Verification Primitive

Generating Synthetic Citation Networks with Communities

Stop Using the Wilcoxon Test: Myth, Misconception and Misuse in IR Research

Feature Anchors for Time-Series Sensor-Based Human Activity Recognition

Browse by Category

Research Type

Publish Your Research