Andrea Testa in Computer Science — Research Repository

Computer Science Preprint PDF DOI

From Mirage to Grounding: Towards Reliable Multimodal Circuit-to-Verilog Code Generation

Guang Yang, Xing Hu, Xiang Chen, Xin Xi · 2026

Multimodal large language models (MLLMs) are increasingly used to translate visual artifacts into code, from UI mockups into HTML to scientific plots into Python scripts. A circuit diagram can be view…

Read Paper →

Computer Science Preprint PDF DOI

SimEval-IR: A Unified Toolkit and Benchmark Suite for Evaluating User Simulators and Search Sessions

Saber Zerhoudi · 2026

User simulators are increasingly central to interactive information retrieval, yet the community lacks standardized evaluation tools. Simulators serve two objectives, behavioral realism (matching real…

Read Paper →

Computer Science Preprint PDF DOI

Test Before You Deploy: Governing Updates in the LLM Supply Chain

Mohd Sameen Chishti, Damilare Peter Oyinloye, Jingyue Li · 2026

Large Language Models (LLMs) are increasingly used as core dependencies in software systems. However, the hosted LLM services evolve continuously through provider-side updates without explicit version…

Read Paper →

Computer Science Preprint PDF DOI

LLM-as-a-Judge for Human-AI Co-Creation: A Reliability-Aware Evaluation Framework for Coding

Md Faizul Ibne Amin, Yutaka Watanobe, Daniel M. Muepu, Haruto Suzuki, Kenta Nanaumi, Md Mostafizer Rahman · 2026

LLMs are increasingly employed both as judges for evaluating open-ended outputs and as co-creation partners in AI-assisted programming; yet rigorous evaluation in human-AI co-creation settings remains…

Read Paper →

Computer Science Preprint PDF DOI

How Code Representation Shapes False-Positive Dynamics in Cross-Language LLM Vulnerability Detection

Maofei Chen, Laifu Wang, Yue Qin, Yuan Wang, Bo Wu, Dongxin Liu · 2026

How code representation format shapes false positive behaviour in cross-language LLM vulnerability detection remains poorly understood. We systematically vary training intensity and code representatio…

Read Paper →

Computer Science Preprint PDF DOI

Line Segment Clipping using Quadrilateral Concavity and Convexity

Bimal Kumar Ray · 2026

This paper proposes an algorithm for clipping line segment against an axis-aligned rectangular window. The conventional algorithms for line segment clipping treat the clipping boundary and/or the line…

Read Paper →

Computer Science Preprint PDF DOI

PuzzleMark: Implicit Jigsaw Learning for Robust Code Dataset Watermarking in Neural Code Completion Models

Haocheng Huang, Yuchen Chen, Weisong Sun, Peizhuo Lv, Yuan Xiao, Chunrong Fang, Yang Liu, Xiaofang Zhang · 2026

Constructing and curating high-quality code datasets requires significant resources, making them valuable intellectual property. Unfortunately, these datasets currently face severe risks of unauthoriz…

Read Paper →

Computer Science Preprint PDF DOI

An Exact 56-Addition, Rank-23 Scheme for General 3*3 Matrix Multiplication

Yinqi Sun · 2026

We present a rank-$23$ algorithm for general $3\times3$ matrix multiplication that uses $56$ additions/subtractions and $23$ multiplications, for a total of $79$ scalar operations in the standard bili…

Read Paper →

Computer Science Preprint PDF DOI

ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models

Jiasheng Zheng, Xin Zheng, Boxi Cao, Pengbo Wang, Zhengzhao Ma, Qiming Zhu, Jiazhen Jiang, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun · 2026

Code sandboxes have emerged as a critical infrastructure for advancing the coding capabilities of large language models, providing verifiable feedback for both RL training and evaluation. However, exi…

Read Paper →

Computer Science Preprint PDF DOI

Predicting Upcoming Stuttering Events from Three-Second Audio: Stratified Evaluation Reveals Severity-Selective Precursors, and the Model Deploys Fully On-Device

Nazar Kozak · 2026

Audio-based stuttering systems to date have been trained for detection -- what disfluency is present now -- leaving prediction, the capability needed for closed-loop intervention, unstudied at deploya…

Read Paper →

Computer Science Preprint PDF DOI

The Synthetic Social Graph: Emergent Behavior in AI Agent Communities

Sungguk Cha, DongWook Kim · 2026

Large language model (LLM) agents are increasingly deployed in social settings, yet little is known about how they interact in open-ended environments. We present the first comprehensive sociological …

Read Paper →

Computer Science Preprint PDF DOI

SynSQL: Synthesizing Relational Databases for Robust Evaluation of Text-to-SQL Systems

Mohammadamin Habibollah, Davood Rafiei · 2026

Evaluating text-to-SQL systems remains largely fragile: correctness is typically judged by executing predicted and gold SQL queries on a single static database, even though the same queries may behave…

Read Paper →

Computer Science Preprint PDF DOI

CI-Repair-Bench: A Repository-Aware Benchmark for Automated Patch Validation via CI Workflows

Rabeya Khatun Muna, Md Nakhla Rafi, Tse-Hsun (Peter) Chen · 2026

Continuous Integration (CI) enforces repository-level correctness through multi-stage workflows and is central to modern software development, yet diagnosing and repairing CI failures remains challeng…

Read Paper →

Computer Science Preprint PDF DOI

On the Effectiveness of Modular Testing with EvoSuite

Elizabeth Dinella · 2026

This paper explores the effectiveness of modular randomized testing for object oriented programs in Java. Modular testing involves testing individual components of a program in isolation. Often times,…

Read Paper →

Computer Science Preprint PDF DOI

ClassEval-Pro: A Cross-Domain Benchmark for Class-Level Code Generation

Yeheng Chen, Chaoxiang Xie, Yuling Shi, Wenhao Zeng, Yongpan Wang, Hongyu Zhang, Xiaodong Gu · 2026

LLMs have achieved strong results on both function-level code synthesis and repository-level code modification, yet a capability that falls between these two extremes -- compositional code creation, i…

Read Paper →

Computer Science Preprint PDF DOI

Hot Fixing in the Wild

Carol Hanna, Karine Even-Mendoza, W.B. Langdon, Mar Zamorano Lopez, Justyna Petke, Federica Sarro · 2026

Despite the operational importance of hot fixes, large-scale evidence on how they reshape routine maintenance workflows, particularly in the era of autonomous coding agents, remains limited. We analys…

Read Paper →

Computer Science Preprint PDF DOI

A Test Taxonomy and Continuous Integration Ecosystem for Dynamic Resource Management in HPC

Petter Sand{aa}s, Inigo Arejula-Aisa, Sergio Iserte, Antonio J. Pena · 2026

High-performance computing (HPC) systems are increasingly exploring dynamic resource management and malleable MPI applications to better adapt to heterogeneous architectures, fluctuating workloads, an…

Read Paper →

Computer Science Preprint PDF DOI

A Toolkit for Detecting Spurious Correlations in Speech Datasets

Lara Gauder, Pablo Riera, Andrea Slachevsky, Gonzalo Forno, Adolfo M. Garcia, Luciana Ferrer · 2026

We introduce a toolkit for uncovering spurious correlations between recording characteristics and target class in speech datasets. Spurious correlations may arise due to heterogeneous recording condit…

Read Paper →

Computer Science Preprint PDF DOI

Reproducible Automated Program Repair Is Hard -- Experiences With the Defects4J Dataset

Adam Krafczyk, Klaus Schmid · 2026

In the research of automated program repair (APR), benchmark datasets consisting of known defects in combination with test suites that indicate the defects are of high importance. They allow for an ev…

Read Paper →

Computer Science Preprint PDF DOI

TDD Governance for Multi-Agent Code Generation via Prompt Engineering

Tarlan Hasanli, Shahbaz Siddeeq, Bishwash Khanal, Pyry Kotilainen, Tommi Mikkonen, Pekka Abrahamsson · 2026

Large language models (LLMs) accelerate software development but often exhibit instability, non-determinism, and weak adherence to development discipline in unconstrained workflows. While test-driven …

Read Paper →

Browse Research Papers

From Mirage to Grounding: Towards Reliable Multimodal Circuit-to-Verilog Code Generation

SimEval-IR: A Unified Toolkit and Benchmark Suite for Evaluating User Simulators and Search Sessions

Test Before You Deploy: Governing Updates in the LLM Supply Chain

LLM-as-a-Judge for Human-AI Co-Creation: A Reliability-Aware Evaluation Framework for Coding

How Code Representation Shapes False-Positive Dynamics in Cross-Language LLM Vulnerability Detection

Line Segment Clipping using Quadrilateral Concavity and Convexity

PuzzleMark: Implicit Jigsaw Learning for Robust Code Dataset Watermarking in Neural Code Completion Models

An Exact 56-Addition, Rank-23 Scheme for General 3*3 Matrix Multiplication

ScaleBox: Enabling High-Fidelity and Scalable Code Verification for Large Language Models

Predicting Upcoming Stuttering Events from Three-Second Audio: Stratified Evaluation Reveals Severity-Selective Precursors, and the Model Deploys Fully On-Device

The Synthetic Social Graph: Emergent Behavior in AI Agent Communities

SynSQL: Synthesizing Relational Databases for Robust Evaluation of Text-to-SQL Systems

CI-Repair-Bench: A Repository-Aware Benchmark for Automated Patch Validation via CI Workflows

On the Effectiveness of Modular Testing with EvoSuite

ClassEval-Pro: A Cross-Domain Benchmark for Class-Level Code Generation

Hot Fixing in the Wild

A Test Taxonomy and Continuous Integration Ecosystem for Dynamic Resource Management in HPC

A Toolkit for Detecting Spurious Correlations in Speech Datasets

Reproducible Automated Program Repair Is Hard -- Experiences With the Defects4J Dataset

TDD Governance for Multi-Agent Code Generation via Prompt Engineering

Browse by Category

Research Type

Publish Your Research