Pedro Cabalar in Computer Science — Research Repository

Computer Science Preprint PDF DOI

LLM-CEG: Extending the Classification Error Gauge Framework for Privacy Auditing of Large Language Models

Kato Mivule · 2026

This paper extends the Classification Error Gauge (x-CEG) framework, originally developed for measuring the privacy-utility trade-off in tabular datasets, to privacy auditing of Large Language Models …

Read Paper →

Computer Science Preprint PDF DOI

A dataset of early blockchain-registered AI agents on Ethereum

Yulin Liu · 2026

This study presents a structured dataset of blockchain-registered artificial intelligence agents under the ERC-8004 standard on Ethereum. The dataset integrates on-chain identity records, minting tran…

Read Paper →

Computer Science Preprint PDF DOI

Automated Extraction of Pharmacokinetic Parameters from Structured XML Scientific Articles: Enhancing Data Accessibility at Scale

Remya Ampadi Ramachandran, Lisa A. Tell, Sidharth Rai, Nuwan Millagaha Gedara, Hossein Sholehrasa, Jim E. Riviere, Majid Jaberi-Douraki · 2026

In the field of pharmacology, there is a notable absence of centralized, comprehensive, and up-to-date repositories of PK data. This poses a significant challenge for R&D as it can be a time-consuming…

Read Paper →

Computer Science Preprint PDF DOI

Auditing LLMs for Algorithmic Fairness in Casenote-Augmented Tabular Prediction

Xiao Qi Lee, Ezinne Nwankwo, Angela Zhou · 2026

LLMs are increasingly being considered for prediction tasks in high-stakes social service settings, but their algorithmic fairness properties in this context are poorly understood. In this short techn…

Read Paper →

Computer Science Preprint PDF DOI

Apple Peel Unfolding of Archimedean and Catalan Solids

Takashi Yoshino, Supanut Chaidee · 2026

We consider a new treatment for making polyhedron nets referred to as ``apple peel unfolding'': drawing the nets as if we were peeling off appleskins. We define apple peel unfolding strictly and imple…

Read Paper →

Computer Science Preprint PDF DOI

LLMs Are Not a Silver Bullet: A Case Study on Software Fairness

Xinyue Li, Sixuan Li, Ying Xiao, Jie M. Zhang, Zhou Yang, Xuanzhe Liu, Zhenpeng Chen · 2026

Fairness is a critical requirement for human-related, high-stakes software systems, motivating extensive research on bias mitigation. Prior work has largely focused on tabular data settings using trad…

Read Paper →

Computer Science Preprint PDF DOI

Optimizing IoT Intrusion Detection with Tabular Foundation Models for Smart City Forensics

Asma Al-Dahmani, Abdulla Bin Safwan, Mohammad Obeidat, Belal Alsinglawi · 2026

Security operations in smart cities demand detection systems that balance accuracy with response time. While ensemble methods like Random Forest achieve high accuracy, their computational overhead imp…

Read Paper →

Computer Science Preprint PDF DOI

BDIViz in Action: Interactive Curation and Benchmarking for Schema Matching Methods

Eden Wu, Christos Koutras, Claudio T. Silva, Juliana Freire · 2026

Schema matching remains fundamental to data integration, yet evaluating and comparing matching methods is hindered by limited benchmark diversity and lack of interactive validation frameworks. BDIViz,…

Read Paper →

Computer Science Preprint PDF DOI

ZoomTable: Interactive Exploration of Data Facts in Hierarchical Tables via Semantic Zooming

Qiyang Chen, Guozheng Li, Xingqi Wang, Gerile Aodeng, Min Lu, Chi Harold Liu · 2026

Hierarchical tables are an important structure for organizing data with inherent hierarchical relationships. Existing studies have extensively explored methods for data fact exploration from tabular d…

Read Paper →

Computer Science Preprint PDF DOI

Raiven: LLM-Based Visualization Authoring via Domain-Specific Language Mediation

Alexandra Irger, Ella Hugie, Minghao Guo, Simon Warchol, Kenneth Moreland, David Pugmire, Wojciech Matusik, Hanspeter Pfister · 2026

Visualization is central to scientific discovery, yet authoring tools remain split between information and scientific visualization, and expertise in one rarely transfers to the other. Large Language …

Read Paper →

Computer Science Preprint PDF DOI

A Catalog of Data Errors

Divya Bhadauria, Hazar Harmouch, Felix Naumann, Divesh Srivastava, Lisa Ehrlinger · 2026

Data errors are widespread in real-world databases and severely impact downstream applications, such as machine learning pipelines or business analytics reports. Causes of such errors are manifold and…

Read Paper →

Computer Science Preprint PDF DOI

Improving Explanations: Applying the Feature Understandability Scale for Cost-Sensitive Feature Selection

Nicola Rossberg, Bennett Kleinberg, Barry O'Sullivan, Luca Longo, Andrea Visentin · 2026

With the growing pervasiveness of artificial intelligence, the ability to explain the inferences made by machine learning models has become increasingly important. Numerous techniques for model explai…

Read Paper →

Computer Science Preprint PDF DOI

From BM25 to Corrective RAG: Benchmarking Retrieval Strategies for Text-and-Table Documents

Meftun Akarsu, Recep Kaan Karaman, Christopher Mierbach · 2026

Retrieval-Augmented Generation (RAG) systems critically depend on retrieval quality, yet no systematic comparison of modern retrieval methods exists for heterogeneous documents containing both text an…

Read Paper →

Computer Science Preprint PDF DOI

GISclaw: An Open-Source LLM-Powered Agent System for Full-Stack Geospatial Analysis

Jinzhen Han, JinByeong Lee, Yuri Shim, Jisung Kim, Jae-Joon Lee · 2026

The convergence of Large Language Models (LLMs) and Geographic Information Science has opened new avenues for automating complex geospatial analysis. However, existing LLM-powered GIS agents are const…

Read Paper →

Computer Science Preprint PDF DOI

Revealing the influence of participant failures on model quality in cross-silo Federated Learning

Fabian Stricker, David Bermbach, Christian Zirpins · 2026

Federated Learning (FL) is a paradigm for training machine learning (ML) models in collaborative settings while preserving participants' privacy by keeping raw data local. A key requirement for the us…

Read Paper →

Computer Science Preprint PDF DOI

Unveiling the Security Risks of Federated Learning in the Wild: From Research to Practice

Jiahao Chen, Zhiming Zhao, Yuwen Pu, Chunyi Zhou, Zhou Feng, Songze Li, Shouling Ji · 2026

Federated learning (FL) has attracted substantial attention in both academia and industry, yet its practical security posture remains poorly understood. In particular, a large body of poisoning resear…

Read Paper →

Computer Science Preprint PDF DOI

Galaxy Tracer: A Topology-First 3D Interface for Interactive PCAP Exploration

Ryan Younger · 2026

Packet analysis tools conventionally present capture data through tabular packet lists, constraining the analyst to a sequential view that obscures the relational structure of network communication. T…

Read Paper →

Computer Science Preprint PDF DOI

Financial Transaction Retrieval and Contextual Evidence for Knowledge-Grounded Reasoning

Artem Sakhno, Daniil Tomilov, Yuliana Shakhvalieva, Inessa Fedorova, Daria Ruzanova, Omar Zoloev, Andrey Savchenko, Maksim Makarenko · 2026

Nowadays, success of financial organizations heavily depends on their ability to process digital traces generated by their clients, e.g., transaction histories, gathered from various sources to improv…

Read Paper →

Computer Science Preprint PDF DOI

SliceMapper: Intelligent Mapping of O-CU and O-DU onto O-Cloud Sites in 6G O-RAN

Mohammad Asif Habibi, Xavier Costa-Perez, Hans D. Schotten · 2026

In this paper, we propose an rApp, named SliceMapper, to optimize the mapping process of the open centralized unit (O-CU) and open distributed unit (O-DU) of an open radio access network (O-RAN) slice…

Read Paper →

Computer Science Preprint PDF DOI

TableMark: A Multi-bit Watermark for Synthetic Tabular Data

Yuyang Xia, Yaoqiang Xu, Chen Qian, Yang Li, Guoliang Li, Jianhua Feng · 2026

Watermarking has emerged as an effective solution for copyright protection of synthetic data. However, applying watermarking techniques to synthetic tabular data presents challenges, as tabular data c…

Read Paper →

Browse Research Papers

LLM-CEG: Extending the Classification Error Gauge Framework for Privacy Auditing of Large Language Models

A dataset of early blockchain-registered AI agents on Ethereum

Automated Extraction of Pharmacokinetic Parameters from Structured XML Scientific Articles: Enhancing Data Accessibility at Scale

Auditing LLMs for Algorithmic Fairness in Casenote-Augmented Tabular Prediction

Apple Peel Unfolding of Archimedean and Catalan Solids

LLMs Are Not a Silver Bullet: A Case Study on Software Fairness

Optimizing IoT Intrusion Detection with Tabular Foundation Models for Smart City Forensics

BDIViz in Action: Interactive Curation and Benchmarking for Schema Matching Methods

ZoomTable: Interactive Exploration of Data Facts in Hierarchical Tables via Semantic Zooming

Raiven: LLM-Based Visualization Authoring via Domain-Specific Language Mediation

A Catalog of Data Errors

Improving Explanations: Applying the Feature Understandability Scale for Cost-Sensitive Feature Selection

From BM25 to Corrective RAG: Benchmarking Retrieval Strategies for Text-and-Table Documents

GISclaw: An Open-Source LLM-Powered Agent System for Full-Stack Geospatial Analysis

Revealing the influence of participant failures on model quality in cross-silo Federated Learning

Unveiling the Security Risks of Federated Learning in the Wild: From Research to Practice

Galaxy Tracer: A Topology-First 3D Interface for Interactive PCAP Exploration

Financial Transaction Retrieval and Contextual Evidence for Knowledge-Grounded Reasoning

SliceMapper: Intelligent Mapping of O-CU and O-DU onto O-Cloud Sites in 6G O-RAN

TableMark: A Multi-bit Watermark for Synthetic Tabular Data

Browse by Category

Research Type

Publish Your Research