Thomas Y. Hou in Computer Science — Research Repository

Computer Science Preprint PDF DOI

Index-Assisted Stratified Sampling for Online Aggregation

Yunnan Yu, Zhuoyue Zhao · 2026

Ad-hoc queries over frequently updated data in a flat schema are common in real-time data analysis applications and often require very low latency. Online aggregation can achieve so by providing appro…

Read Paper →

Computer Science Preprint PDF DOI

Akita: A High Usability Simulation Framework for Computer Architecture

Sabila Al Jannat, Ying Li, Mengyang He, Xuzhong Wang, Huizhi Zhao, Jingxiang Sun, Daoxuan Xu, Enze Xu, Yifan Sun · 2026

Computer architecture simulation is essential for evaluating new designs without the need for costly tapeout. The community has developed dozens of valuable simulators that have enabled significant ar…

Read Paper →

Computer Science Preprint PDF DOI

When and How AI Should Assist Brainstorming for AI Impact Assessment

Jarod Govers, Sanja Scepanovic, Daniele Quercia · 2026

A key task in AI practice is to assess potential impacts to prevent harm. Current AI tools assisting AI impact assessment have not been designed or evaluated for collaborative team brainstorming, and …

Read Paper →

Computer Science Preprint PDF DOI

A Generalisation of Goursat's Algorithm for Integration in Finite Terms

Sam Blake · 2026

We give a self-contained, modern exposition of \'Edouard Goursat's 1887 theorem on pseudo-elliptic integrals -- those integrals of the form $\int F(t)\,\d t/\sqrt{R(t)}$ with $R$ a cubic or quartic po…

Read Paper →

Computer Science Preprint PDF DOI

How Generative AI Disrupts Search: An Empirical Study of Google Search, Gemini, and AI Overviews

Riley Grossman, Songjiang Liu, Michael K. Chen, Mike Smith, Cristian Borcea, Yi Chen · 2026

Generative AI is being increasingly integrated into web search for the convenience it provides users. In this work, we aim to understand how generative AI disrupts web search by retrieving and present…

Read Paper →

Computer Science Preprint PDF DOI

Test Before You Deploy: Governing Updates in the LLM Supply Chain

Mohd Sameen Chishti, Damilare Peter Oyinloye, Jingyue Li · 2026

Large Language Models (LLMs) are increasingly used as core dependencies in software systems. However, the hosted LLM services evolve continuously through provider-side updates without explicit version…

Read Paper →

Computer Science Preprint PDF DOI

Why Self-Supervised Encoders Want to Be Normal

Yuval Domb · 2026

We develop a geometric and information-theoretic framework for encoder-decoder learning built on the Information Bottleneck (IB) principle. Recasting IB as a rate-distortion problem with Kullback-Leib…

Read Paper →

Computer Science Preprint PDF DOI

How Code Representation Shapes False-Positive Dynamics in Cross-Language LLM Vulnerability Detection

Maofei Chen, Laifu Wang, Yue Qin, Yuan Wang, Bo Wu, Dongxin Liu · 2026

How code representation format shapes false positive behaviour in cross-language LLM vulnerability detection remains poorly understood. We systematically vary training intensity and code representatio…

Read Paper →

Computer Science Preprint PDF DOI

PuzzleMark: Implicit Jigsaw Learning for Robust Code Dataset Watermarking in Neural Code Completion Models

Haocheng Huang, Yuchen Chen, Weisong Sun, Peizhuo Lv, Yuan Xiao, Chunrong Fang, Yang Liu, Xiaofang Zhang · 2026

Constructing and curating high-quality code datasets requires significant resources, making them valuable intellectual property. Unfortunately, these datasets currently face severe risks of unauthoriz…

Read Paper →

Computer Science Preprint PDF DOI

Users' Activity Logs: the Good, the Bad, the Misconception, and the Disastrous

Eman Alashwali · 2026

Most service providers, such as Google, save logs from data generated by users while using the service. Many service providers provide users with privacy controls to manage whether, how, and for how l…

Read Paper →

Computer Science Preprint PDF DOI

Structural Dissolution: How Artificial Intelligence Dismantles Coordination Architecture and Reconfigures the Political Economy of Production

Chao Li (AI Edtech Governance Trust, Independent Researcher in AI Governance), Chunyi Zhao (AI Edtech Governance Trust, Independent Researcher in AI Governance) · 2026

This paper introduces the Structural Dissolution Framework to explain how artificial intelligence restructures the coordination architecture of traditional industries. We argue that AI dissolves the b…

Read Paper →

Computer Science Preprint PDF DOI

From Notepad AI to Social Media: How Can Text Style Transformation Mitigate Social Harm?

Syed Mhamudul Hasan, Mohd. Farhan Israk Soumik, Abdur R. Shahid · 2026

The rapid proliferation of harmful and emotionally damaging content on social media platforms has intensified concerns regarding societal harm. While content moderation efforts primarily focus on dete…

Read Paper →

Computer Science Preprint PDF DOI

One Size Fits All? An Empirical Comparison of ADR Templates regarding Comprehension, Usability, and Ease of Adoption

Fernando Nogueira, Nabson Silva, Tayana Conte · 2026

Context: Documenting Architectural Design Decisions (ADDs) is a critical factor in the software lifecycle, essential for efficient system maintenance, developer onboarding, and preventing knowledge va…

Read Paper →

Computer Science Preprint PDF DOI

NuggetIndex: Governed Atomic Retrieval for Maintainable RAG

Saber Zerhoudi, Michael Granitzer, Jelena Mitrovic · 2026

Retrieval-augmented generation (RAG) systems are frequently evaluated via fact-based metrics, yet standard implementations retrieve passages or static propositions. This unit mismatch between evaluati…

Read Paper →

Computer Science Preprint PDF DOI

The Likelihood Ratio Wall: Structural Limits on Accurate Risk Assessment for Rare Violence

Marco Pollanen · 2026

Pretrial risk assessment tools are used on over one million U.S. defendants each year, yet their use for predicting rare violent re-offense faces a basic statistical barrier. We derive a universal pre…

Read Paper →

Computer Science Preprint PDF DOI

Hot Fixing in the Wild

Carol Hanna, Karine Even-Mendoza, W.B. Langdon, Mar Zamorano Lopez, Justyna Petke, Federica Sarro · 2026

Despite the operational importance of hot fixes, large-scale evidence on how they reshape routine maintenance workflows, particularly in the era of autonomous coding agents, remains limited. We analys…

Read Paper →

Computer Science Preprint PDF DOI

A Test Taxonomy and Continuous Integration Ecosystem for Dynamic Resource Management in HPC

Petter Sand{aa}s, Inigo Arejula-Aisa, Sergio Iserte, Antonio J. Pena · 2026

High-performance computing (HPC) systems are increasingly exploring dynamic resource management and malleable MPI applications to better adapt to heterogeneous architectures, fluctuating workloads, an…

Read Paper →

Computer Science Preprint PDF DOI

Breaking Bad Financial Habits: How LLM Conversations Correct Financial Misconceptions

Jillian Ross, Eric So, Andrew W. Lo · 2026

Financial misconceptions carry direct economic costs, from panic selling to equity market avoidance, yet they are notoriously resistant to correction. Traditional financial literacy interventions are …

Read Paper →

Computer Science Preprint PDF DOI

When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models

Dongxin Guo, Jikun Wu, Siu Ming Yiu · 2026

Large reasoning models such as DeepSeek-R1 and OpenAI o1 generate extended chains of thought spanning thousands of tokens, yet their integration with retrieval-augmented generation (RAG) remains funda…

Read Paper →

Computer Science Preprint PDF DOI

The Buy-or-Build Decision, Revisited: How Agentic AI Changes the Economics of Enterprise Software

David Klotz · 2026

Advances in generative artificial intelligence, particularly agentic coding systems capable of autonomous software development, are disrupting the economics of the make-or-buy decision for enterprise …

Read Paper →

Browse Research Papers

Index-Assisted Stratified Sampling for Online Aggregation

Akita: A High Usability Simulation Framework for Computer Architecture

When and How AI Should Assist Brainstorming for AI Impact Assessment

A Generalisation of Goursat's Algorithm for Integration in Finite Terms

How Generative AI Disrupts Search: An Empirical Study of Google Search, Gemini, and AI Overviews

Test Before You Deploy: Governing Updates in the LLM Supply Chain

Why Self-Supervised Encoders Want to Be Normal

How Code Representation Shapes False-Positive Dynamics in Cross-Language LLM Vulnerability Detection

PuzzleMark: Implicit Jigsaw Learning for Robust Code Dataset Watermarking in Neural Code Completion Models

Users' Activity Logs: the Good, the Bad, the Misconception, and the Disastrous

Structural Dissolution: How Artificial Intelligence Dismantles Coordination Architecture and Reconfigures the Political Economy of Production

From Notepad AI to Social Media: How Can Text Style Transformation Mitigate Social Harm?

One Size Fits All? An Empirical Comparison of ADR Templates regarding Comprehension, Usability, and Ease of Adoption

NuggetIndex: Governed Atomic Retrieval for Maintainable RAG

The Likelihood Ratio Wall: Structural Limits on Accurate Risk Assessment for Rare Violence

Hot Fixing in the Wild

A Test Taxonomy and Continuous Integration Ecosystem for Dynamic Resource Management in HPC

Breaking Bad Financial Habits: How LLM Conversations Correct Financial Misconceptions

When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models

The Buy-or-Build Decision, Revisited: How Agentic AI Changes the Economics of Enterprise Software

Browse by Category

Research Type

Publish Your Research