Fixed Point Explainability

Emanuele La Malfa, Jon Vadillo, Marco Molinari, Michael Wooldridge

Artificial Intelligence And Data Science PDF Available Non-peer-reviewed Preprint

Fixed Point Explainability

Emanuele La Malfa, Jon Vadillo, Marco Molinari, Michael Wooldridge · Published 2025-05-18

Expertini /
Research /
Artificial Intelligence And Data Science /
Fixed Point Explainability

📄 Download PDF 🔖 Bookmark Paper

Abstract

This paper introduces a formal notion of fixed point explanations, inspired by the "why regress" principle, to assess, through recursive applications, the stability of the interplay between a model and its explainer. Fixed point explanations satisfy properties like minimality, stability, and faithfulness, revealing hidden model behaviours and explanatory weaknesses. We define convergence conditions for several classes of explainers, from feature-based to mechanistic tools like Sparse AutoEncoders, and we report quantitative and qualitative results for several datasets and models, including LLMs such as Llama-3.3-70B.

Keywords

Artificial Intelligence & Data Science

📄 Full Paper Available as PDF

This paper is available as a downloadable PDF.

📄 Download PDF

Comments (0)

No comments yet. Be the first to comment.

Paper Details

Authors Emanuele La Malfa ,
Jon Vadillo ,
Marco Molinari ,
Michael Wooldridge
Published 2025-05-18
Category Artificial Intelligence And Data Science
Status Non-peer-reviewed Preprint
Language English
Word Count 83

Fixed Point Explainability

Abstract

Keywords

✨ AI Plain-English Summary

Comments (0)

Related Papers

An Efficient Algorithm for Computing Interventional Distributions in ...

Sparse matrix-variate Gaussian process blockmodels for network modeling

Hierarchical Maximum Margin Learning for Multi-Class Classification

Tightening MRF Relaxations with Planar Subproblems