Haim Avron in Computer Science — Research Repository

Computer Science Preprint PDF DOI

When and How AI Should Assist Brainstorming for AI Impact Assessment

Jarod Govers, Sanja Scepanovic, Daniele Quercia · 2026

A key task in AI practice is to assess potential impacts to prevent harm. Current AI tools assisting AI impact assessment have not been designed or evaluated for collaborative team brainstorming, and …

Read Paper →

Computer Science Preprint PDF DOI

From Notepad AI to Social Media: How Can Text Style Transformation Mitigate Social Harm?

Syed Mhamudul Hasan, Mohd. Farhan Israk Soumik, Abdur R. Shahid · 2026

The rapid proliferation of harmful and emotionally damaging content on social media platforms has intensified concerns regarding societal harm. While content moderation efforts primarily focus on dete…

Read Paper →

Computer Science Preprint PDF DOI

Lexical Anthropomorphization Influences on Moral Judgments of AI Bad Behavior

Jaime Banks, Nicholas David Bowman, Roman Saladino · 2026

Anthropomorphic language describing artificial intelligence (AI) is widespread in media, policy, and everyday discourse; so too are discussions of AI bad behavior, from hallucinations to inappropriate…

Read Paper →

Computer Science Preprint PDF DOI

Safety Drift After Fine-Tuning: Evidence from High-Stakes Domains

Emaan Bilal Khan, Amy Winecoff, Miranda Bogen, Dylan Hadfield-Menell · 2026

Foundation models are routinely fine-tuned for use in particular domains, yet safety assessments are typically conducted only on base models, implicitly assuming that safety properties persist through…

Read Paper →

Computer Science Preprint PDF DOI

Why AI Harms Can't Be Fixed One Identity at a Time: What 5300 Incident Reports Reveal About Intersectionality

Edyta Bogucka, Sanja Scepanovic, Daniele Quercia · 2026

AI risk assessment is the primary tool for identifying harms caused by AI systems. These include intersectional harms, which arise from the interaction between identity categories (e.g., class and ski…

Read Paper →

Computer Science Preprint PDF DOI

Designing escalation criteria for international AI incident response: criteria, triggers, and thresholds

Francesca Gomez, Matthew Ball, Michael Harre, Lydia Preston, Josephine Schwab, Caio Machado · 2026

AI incident reporting requirements are emerging in regulation and policy, yet no operational criteria exist for determining when a detected AI incident warrants escalation beyond national handling to …

Read Paper →

Computer Science Preprint PDF DOI

What People See (and Miss) About Generative AI Risks: Perceptions of Failures, Risks, and Who Should Address Them

Megan Li, Wendy Bickersteth, Ningjing Tang, Parv Kapoor, Khinezin Win, Peter Zhong, Jason I. Hong, Lorrie Faith Cranor, Hoda Heidari, Hong Shen · 2026

Despite growing concerns about the risks of Generative AI (GenAI), there is limited understanding of public perceptions of these risks and their associated failure modes -- defined as recurring patter…

Read Paper →

Computer Science Preprint PDF DOI

Recognition Without Authorization: LLMs and the Moral Order of Online Advice

Tom van Nuenen · 2026

Large language models are increasingly used to mediate everyday interpersonal dilemmas, yet how their advisory defaults interact with the concentrated moral orders of specific communities remains poor…

Read Paper →

Computer Science Preprint PDF DOI

Lessons from External Review of DeepMind's Scheming Inability Safety Case

Stephen Barrett, Francisco Javier Campos Zabala, Sean P. Fillingham, Umair Siddique, James Walpole, Robin Bloomfield, Henry Papadatos · 2026

Safety cases for frontier AI systems should provide a convincing argument, supported by evidence, that the risk of harm is within an acceptable bound. When developers author their own safety cases, co…

Read Paper →

Computer Science Preprint PDF DOI

Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

Zihan Wang, Rui Zhang, Yu Liu, Chi Liu, Qingchuan Zhao, Hongwei Li, Guowen Xu · 2026

Large language model (LLM) agents increasingly rely on skills to package reusable capabilities through instructions, tools, and resources. High-quality skills embed expert knowledge, curated workflows…

Read Paper →

Computer Science Preprint PDF DOI

A pragmatic classification of AI incident trajectories

Isaak Mengesha, Branwen Owen, Charlie Collins, Tina Wong, Simon Mylius, Peter Slattery, Sean McGregor · 2026

Public AI incident database counts conflate changes in reporting propensity, deployment growth, and shifts in harm frequency per unit of exposure. These issues introduce significant uncertainties chal…

Read Paper →

Computer Science Preprint PDF DOI

Bias in the Tails: How Name-conditioned Evaluative Framing in Resume Summaries Destabilizes LLM-based Hiring

Huy Nghiem, Phuong-Anh Nguyen-Le, Sy-Tuyen Ho, Hal Daume III · 2026

Research has documented LLMs' name-based bias in hiring and salary recommendations. In this paper, we instead consider a setting where LLMs generate candidate summaries for downstream assessment. In a…

Read Paper →

Computer Science Preprint PDF DOI

Different Paths to Harmful Compliance: Behavioral Side Effects and Mechanistic Divergence Across LLM Jailbreaks

Md Rysul Kabir, Zoran Tiganj · 2026

Open-weight language models can be rendered unsafe through several distinct interventions, but the resulting models may differ substantially in capabilities, behavioral profile, and internal failure m…

Read Paper →

Computer Science Preprint PDF DOI

Tight Auditing of Differential Privacy in MST and AIM

Georgi Ganev, Meenatchi Sundaram Muthu Selva Annamalai, Bogdan Kulynych · 2026

State-of-the-art Differentially Private (DP) synthetic data generators such as MST and AIM are widely used, yet tightly auditing their privacy guarantees remains challenging. We introduce a Gaussian D…

Read Paper →

Computer Science Preprint PDF DOI

Inductive Dual-Polarity Modeling via Static-Dynamic Disentanglement for Dynamic Signed Networks

Yikang Hou, Junjie Huang, Yijun Ran, Tao Jia · 2026

Dynamic signed networks (DSNs) are common in online platforms, where time-stamped positive and negative relations evolve over time. A core task in DSNs is dynamic edge prediction, which forecasts futu…

Read Paper →

Computer Science Preprint PDF DOI

Leveraging AI for Direct Bystander Intervention Against Cyberbullying

Peinuan Qin, Jiting Cheng, Jungup Lee, Junti Zhang, Zhixing Liu, Yi-Chieh Lee · 2026

Cyberbullying is a pervasive problem in online environments, causing substantial psychological harm to victims. Although bystander intervention has proven effective in mitigating its impact, motivatin…

Read Paper →

Computer Science Preprint PDF DOI

Enabling Sensitive Conversations with Consent Boundaries: Moa, a Platform for Discussing PhD Advising Relationships

Jane Im, Kentaro Toyama · 2026

When an individual is harmed by someone in power, such as a workplace manager, it can help to identify allies--people who would offer sympathy, advice, or supportive action. However, ally discovery is…

Read Paper →

Computer Science Preprint PDF DOI

Owner-Harm: A Missing Threat Model for AI Agent Safety

Dongcheng Zhang, Yiqing Jiang · 2026

Existing AI agent safety benchmarks focus on generic criminal harm (cybercrime, harassment, weapon synthesis), leaving a systematic blind spot for a distinct and commercially consequential threat cate…

Read Paper →

Computer Science Preprint PDF DOI

First, Do No Harm (With LLMs): Mitigating Racial Bias via Agentic Workflows

Sihao Xing, Zaur Gouliev · 2026

Large language models (LLMs) are increasingly used in clinical settings, raising concerns about racial bias in both generated medical text and clinical reasoning. Existing studies have identified bias…

Read Paper →

Computer Science Preprint PDF DOI

Navigating the Conceptual Multiverse

Andre Ye, Jenny Y. Huang, Alicia Guo, Rose Novick, Tamara Broderick, Mitchell L. Gordon · 2026

When language models answer open-ended problems, they implicitly make hidden decisions that shape their outputs, leaving users with uncontextualized answers rather than a working map of the problem; d…

Read Paper →

Browse Research Papers

When and How AI Should Assist Brainstorming for AI Impact Assessment

From Notepad AI to Social Media: How Can Text Style Transformation Mitigate Social Harm?

Lexical Anthropomorphization Influences on Moral Judgments of AI Bad Behavior

Safety Drift After Fine-Tuning: Evidence from High-Stakes Domains

Why AI Harms Can't Be Fixed One Identity at a Time: What 5300 Incident Reports Reveal About Intersectionality

Designing escalation criteria for international AI incident response: criteria, triggers, and thresholds

What People See (and Miss) About Generative AI Risks: Perceptions of Failures, Risks, and Who Should Address Them

Recognition Without Authorization: LLMs and the Moral Order of Online Advice

Lessons from External Review of DeepMind's Scheming Inability Safety Case

Black-Box Skill Stealing Attack from Proprietary LLM Agents: An Empirical Study

A pragmatic classification of AI incident trajectories

Bias in the Tails: How Name-conditioned Evaluative Framing in Resume Summaries Destabilizes LLM-based Hiring

Different Paths to Harmful Compliance: Behavioral Side Effects and Mechanistic Divergence Across LLM Jailbreaks

Tight Auditing of Differential Privacy in MST and AIM

Inductive Dual-Polarity Modeling via Static-Dynamic Disentanglement for Dynamic Signed Networks

Leveraging AI for Direct Bystander Intervention Against Cyberbullying

Enabling Sensitive Conversations with Consent Boundaries: Moa, a Platform for Discussing PhD Advising Relationships

Owner-Harm: A Missing Threat Model for AI Agent Safety

First, Do No Harm (With LLMs): Mitigating Racial Bias via Agentic Workflows

Navigating the Conceptual Multiverse

Browse by Category

Research Type

Publish Your Research