Roman Orus in Computer Science — Research Repository

Computer Science Preprint PDF DOI

Index-Assisted Stratified Sampling for Online Aggregation

Yunnan Yu, Zhuoyue Zhao · 2026

Ad-hoc queries over frequently updated data in a flat schema are common in real-time data analysis applications and often require very low latency. Online aggregation can achieve so by providing appro…

Read Paper →

Computer Science Preprint PDF DOI

A proof of Jordan curve theorem based on the sweepline algorithm for trapezoidal decomposition of a polygon

Apurva Mudgal · 2026

We prove the Jordan curve theorem by generalizing the sweepline algorithm for trapezoidal decomposition of a polygon. Our proof uses Zorn's lemma (or, equivalently the axiom of choice). Though several…

Read Paper →

Computer Science Preprint PDF DOI

KISS Sorcar: A Stupidly-Simple General-Purpose and Software Engineering AI Assistant

Koushik Sen · 2026

Large language models can generate code and call tools with remarkable fluency, yet deploying them as practical software engineering assistants still expose stubborn gaps: finite context windows, sing…

Read Paper →

Computer Science Preprint PDF DOI

Synthesizing Multi-Agent Harnesses for Vulnerability Discovery

Hanzhi Liu, Chaofan Shou, Xiaonan Liu, Hongbo Wen, Yanju Chen, Ryan Jingyang Fang, Yu Feng · 2026

LLM agents have begun to find real security vulnerabilities that human auditors and automated fuzzers missed for decades, in source-available targets where the analyst can build and instrument the cod…

Read Paper →

Computer Science Preprint PDF DOI

Enhancing Speaker Verification with Whispered Speech via Post-Processing

Magdalena Go{l}ebiowska, Piotr Syga · 2026

Speaker verification is a task of confirming an individual's identity through the analysis of their voice. Whispered speech differs from phonated speech in acoustic characteristics, which degrades the…

Read Paper →

Computer Science Preprint PDF DOI

Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps

Alankrit Chona, Igor Kozlov, Ambuj Kumar · 2026

We introduce the Cyber Defense Benchmark, a benchmark for measuring how well large language model (LLM) agents perform the core SOC analyst task of threat hunting: given a database of raw Windows even…

Read Paper →

Computer Science Preprint PDF DOI

Local Depth-Based Corrections to Maxmin Landmark Selection for Lazy Witness Persistence

Yifan Zhang · 2026

We study a family of local depth-based corrections to maxmin landmark selection for lazy witness persistence. Starting from maxmin seeds, we partition the cloud into nearest-seed cells and replace or …

Read Paper →

Computer Science Preprint PDF DOI

LLM-Codec: Neural Audio Codec Meets Language Model Objectives

Ho-Lam Chung, Yiming Chen, Hung-yi Lee · 2026

Neural audio codecs are widely used as tokenizers for spoken language models, but they are optimized for waveform reconstruction rather than autoregressive prediction. This mismatch injects acoustical…

Read Paper →

Computer Science Preprint PDF DOI

Terminal Wrench: A Dataset of 331 Reward-Hackable Environments and 3,632 Exploit Trajectories

Ivan Bercovich, Ivgeni Segal, Kexun Zhang, Shashwat Saxena, Aditi Raghunathan, Ziqian Zhong · 2026

We release Terminal Wrench, a subset of 331 terminal-agent benchmark environments, copied from the popular open benchmarks that are demonstrably reward-hackable. The data set includes 3,632 hack traje…

Read Paper →

Computer Science Preprint PDF DOI

The Inference Bottleneck: A Formal Model of Vertical Foreclosure in AI Markets

Gaston Besanson · 2026

As generative AI commercializes, competitive advantage is shifting from model training toward inference, distribution, and routing. This paper develops a formal game-theoretic model of vertical forecl…

Read Paper →

Computer Science Preprint PDF DOI

Systematic Capability Benchmarking of Frontier Large Language Models for Offensive Cyber Tasks

Tyler H. Merves, Michael H. Conaway, Joseph M. Escobar, Hakan T. Otal, Unal Tatar · 2026

We present, to our knowledge, the most comprehensive cross-model evaluation of LLM agents on offensive cybersecurity tasks, benchmarking 10 frontier models from 7 providers on all 200 challenges of th…

Read Paper →

Computer Science Preprint PDF DOI

ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics

Heewon Oh · 2026

We present ArtifactNet, a lightweight framework that detects AI-generated music by reframing the problem as forensic physics -- extracting and analyzing the physical artifacts that neural audio codecs…

Read Paper →

Computer Science Preprint PDF DOI

Scaling Test-Time Compute for Agentic Coding

Joongwon Kim, Wannan Yang, Kelvin Niu, Hongming Zhang, Yun Zhu, Eryk Helenowski, Ruan Silva, Zhengxing Chen, Srinivasan Iyer, Manzil Zaheer, Daniel Fried, Hannaneh Hajishirzi, Sanjeev Arora, Gabriel Synnaeve, Ruslan Salakhutdinov, Anirudh Goyal · 2026

Test-time scaling has become a powerful way to improve large language models. However, existing methods are best suited to short, bounded outputs that can be directly compared, ranked or refined. Long…

Read Paper →

Computer Science Preprint PDF DOI

LinuxArena: A Control Setting for AI Agents in Live Production Software Environments

Tyler Tracy, Ram Potham, Nick Kuhn, Myles Heller, Anshul Khandelwal, Cody Rushing, Henri Lemoine, Miguel Brandao, Tomas Turlik, Adam Hanson, Josh Hills, Amy Ngo, Ram Rachum, Nik Mitchell, Falko Galperin, Oscar Sykes, Pip Arnott, Samuel Prieto Lima, Carlos Giudice, Matt Goldwater, Daniel Popp, Drew de Wet, Ruben Castaing, Qi Guo, Douw Marx, Benjamin Shaffrey, Justin Shenk, Martin Milbradt, Hannah Meagher, Shaheen Ahmed-Chowdhury, Daniel O'Connell, Chris Canal, Buck Shlegeris, Aryan Bhatt · 2026

We introduce LinuxArena, a control setting in which agents operate directly on live, multi-service production environments. LinuxArena contains 20 environments, 1,671 main tasks representing legitimat…

Read Paper →

Computer Science Preprint PDF DOI

"AI Psychosis" in Context: How Conversation History Shapes LLM Responses to Delusional Beliefs

Luke Nicholls, Robert Hutto, Zephrah Soto, Hamilton Morrin, Thomas Pollak, Raj Korpan, Cheryl Carmichael · 2026

Extended interaction with large language models (LLMs) has been linked to the reinforcement of delusional beliefs, a phenomenon attracting growing clinical and public concern. Yet most empirical work …

Read Paper →

Computer Science Preprint PDF DOI

Model Capability Assessment and Safeguards for Biological Weaponization

Michael Richter · 2026

AI leaders and safety reports increasingly warn that advances in model reasoning may enable biological misuse, including by low-expertise users, while major labs describe safeguards as expanding but s…

Read Paper →

Computer Science Preprint PDF DOI

Honeypot Protocol

Najmul Hasan · 2026

Trusted monitoring, the standard defense in AI control, is vulnerable to adaptive attacks, collusion, and strategic attack selection. All of these exploit the fact that monitoring is passive: it obser…

Read Paper →

Computer Science Preprint PDF DOI

The AI Codebase Maturity Model: From Assisted Coding to Fully Autonomous Systems

Andy Anderson · 2026

AI coding tools are widely adopted, but most teams plateau at prompt-and-review without a framework for systematic progression. This paper presents the AI Codebase Maturity Model (ACMM), a 6-level fra…

Read Paper →

Computer Science Preprint PDF DOI

Triage: Routing Software Engineering Tasks to Cost-Effective LLM Tiers via Code Quality Signals

Lech Madeyski · 2026

Context: AI coding agents route every task to a single frontier large language model (LLM), paying premium inference cost even when many tasks are routine. Objectives: We propose Triage, a framework…

Read Paper →

Computer Science Preprint PDF DOI

Assessing REST API Test Generation Strategies with Log Coverage

Nana Reinikainen, Mika Mantyla, Yuqing Wang · 2026

Assessing the effectiveness of REST API tests in black-box settings can be challenging due to the lack of access to source code coverage metrics and polyglot tech stack. We propose three metrics for c…

Read Paper →

Browse Research Papers

Index-Assisted Stratified Sampling for Online Aggregation

A proof of Jordan curve theorem based on the sweepline algorithm for trapezoidal decomposition of a polygon

KISS Sorcar: A Stupidly-Simple General-Purpose and Software Engineering AI Assistant

Synthesizing Multi-Agent Harnesses for Vulnerability Discovery

Enhancing Speaker Verification with Whispered Speech via Post-Processing

Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps

Local Depth-Based Corrections to Maxmin Landmark Selection for Lazy Witness Persistence

LLM-Codec: Neural Audio Codec Meets Language Model Objectives

Terminal Wrench: A Dataset of 331 Reward-Hackable Environments and 3,632 Exploit Trajectories

The Inference Bottleneck: A Formal Model of Vertical Foreclosure in AI Markets

Systematic Capability Benchmarking of Frontier Large Language Models for Offensive Cyber Tasks

ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics

Scaling Test-Time Compute for Agentic Coding

LinuxArena: A Control Setting for AI Agents in Live Production Software Environments

"AI Psychosis" in Context: How Conversation History Shapes LLM Responses to Delusional Beliefs

Model Capability Assessment and Safeguards for Biological Weaponization

Honeypot Protocol

The AI Codebase Maturity Model: From Assisted Coding to Fully Autonomous Systems

Triage: Routing Software Engineering Tasks to Cost-Effective LLM Tiers via Code Quality Signals

Assessing REST API Test Generation Strategies with Log Coverage

Browse by Category

Research Type

Publish Your Research