Ecole Polytechnique in AI & Data Science — Research Repository

AI & Data Science Preprint PDF DOI

On the Learning Curves of Revenue Maximization

Steve Hanneke, Alkis Kalavasis, Shay Moran, Grigoris Velegkas · 2026

Learning curves are a fundamental primitive in supervised learning, describing how an algorithm's performance improves with more data and providing a quantitative measure of its generalization ability…

Read Paper →

AI & Data Science Preprint PDF DOI

OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning

Xinyu Ma, Mingzhou Xu, Xuebo Liu, Chang Jin, Qiang Wang, Derek F. Wong, Min Zhang · 2026

Recent advancements in Reinforcement Learning with Verifiable Rewards (RLVR) have significantly improved Large Language Model (LLM) reasoning, yet models often struggle to explore novel trajectories b…

Read Paper →

AI & Data Science Preprint PDF DOI

PMMA: The Polytechnique Montreal Mobility Aids Dataset

Qingwu Liu, Nicolas Saunier, Guillaume-Alexandre Bilodeau · 2026

This study introduces a new object detection dataset of pedestrians using mobility aids, named PMMA. The dataset was collected in an outdoor environment, where volunteers used wheelchairs, canes, and …

Read Paper →

AI & Data Science Preprint PDF DOI

SR4-Fit: An Interpretable and Informative Classification Algorithm Applied to Prediction of U.S. House of Representatives Elections

Shyam Sundar Murali Krishnan, Dean Frederick Hougen · 2026

The growth of machine learning demands interpretable models for critical applications, yet most high-performing models are ``black-box'' systems that obscure input-output relationships, while traditio…

Read Paper →

AI & Data Science Preprint PDF DOI

Low-Resource Dialect Adaptation of Large Language Models: A French Dialect Case-Study

Eeham Khan, Firas Saidani, Owen Van Esbroeck, Richard Khoury, Leila Kosseim · 2025

Despite the widespread adoption of Large Language Models (LLMs), their strongest capabilities remain largely confined to a small number of high-resource languages for which there is abundant training …

Read Paper →

AI & Data Science Preprint PDF DOI

AL-CoLe: Augmented Lagrangian for Constrained Learning

Ignacio Boero, Ignacio Hounie, Alejandro Ribeiro · 2025

Despite the non-convexity of most modern machine learning parameterizations, Lagrangian duality has become a popular tool for addressing constrained learning problems. We revisit Augmented Lagrangian …

Read Paper →

AI & Data Science Preprint PDF DOI

Approximating evidence via bounded harmonic means

Dana Naderi, Christian P Robert, Kaniav Kamary, Darren Wraith · 2025

Efficient Bayesian model selection relies on the model evidence or marginal likelihood, whose computation often requires evaluating an intractable integral. The harmonic mean estimator (HME) has long …

Read Paper →

AI & Data Science Preprint PDF DOI

COLE: a Comprehensive Benchmark for French Language Understanding Evaluation

David Beauchemin, Yan Tremblay, Mohamed Amine Youssef, Richard Khoury · 2025

To address the need for a more comprehensive evaluation of French Natural Language Understanding (NLU), we introduce COLE, a new benchmark composed of 23 diverse task covering a broad range of NLU cap…

Read Paper →

AI & Data Science Preprint PDF DOI

Preconditioned Regularized Wasserstein Proximal Sampling

Hong Ye Tan, Stanley Osher, Wuchen Li · 2025

We consider sampling from a Gibbs distribution by evolving finitely many particles. We propose a preconditioned version of a recently proposed noise-free sampling method, governed by approximating the…

Read Paper →

AI & Data Science Preprint PDF DOI

Bias-Aware Mislabeling Detection via Decoupled Confident Learning

Yunyi Li, Maria De-Arteaga, Maytal Saar-Tsechansky · 2025

Reliable data is a cornerstone of modern organizational systems. A notable data integrity challenge stems from label bias, which refers to systematic errors in a label, a covariate that is central to …

Read Paper →

AI & Data Science Preprint PDF DOI

Rethinking Layered Graphic Design Generation with a Top-Down Approach

Jingye Chen, Zhaowen Wang, Nanxuan Zhao, Li Zhang, Difan Liu, Jimei Yang, Qifeng Chen · 2025

Graphic design is crucial for conveying ideas and messages. Designers usually organize their work into objects, backgrounds, and vectorized text layers to simplify editing. However, this workflow dema…

Read Paper →

AI & Data Science Preprint PDF DOI

Beyond Accuracy: EcoL2 Metric for Sustainable Neural PDE Solvers

Taniya Kapoor, Abhishek Chandra, Anastasios Stamou, Stephen J Roberts · 2025

Real-world systems, from aerospace to railway engineering, are modeled with partial differential equations (PDEs) describing the physics of the system. Estimating robust solutions for such problems is…

Read Paper →

AI & Data Science Preprint PDF DOI

HJ-sampler: A Bayesian sampler for inverse problems of a stochastic process by leveraging Hamilton-Jacobi PDEs and score-based generative models

Tingwei Meng, Zongren Zou, Jerome Darbon, George Em Karniadakis · 2024

The interplay between stochastic processes and optimal control has been extensively explored in the literature. With the recent surge in the use of diffusion models, stochastic processes have increasi…

Read Paper →

AI & Data Science Preprint PDF DOI

OpenCOLE: Towards Reproducible Automatic Graphic Design Generation

Naoto Inoue, Kento Masui, Wataru Shimoda, Kota Yamaguchi · 2024

Automatic generation of graphic designs has recently received considerable attention. However, the state-of-the-art approaches are complex and rely on proprietary datasets, which creates reproducibili…

Read Paper →

AI & Data Science Preprint PDF DOI

Measuring Dependence between Events

Marc-Oliver Pohle, Timo Dimitriadis, Jan-Lukas Wermuth · 2024

Measuring dependence between two events, or equivalently between two binary random variables, amounts to expressing the dependence structure inherent in a $2\times 2$ contingency table in a real numbe…

Read Paper →

AI & Data Science Preprint PDF DOI

Wasserstein proximal operators describe score-based generative models and resolve memorization

Benjamin J. Zhang, Siting Liu, Wuchen Li, Markos A. Katsoulakis, Stanley J. Osher · 2024

We focus on the fundamental mathematical structure of score-based generative models (SGMs). We first formulate SGMs in terms of the Wasserstein proximal operator (WPO) and demonstrate that, via mean-f…

Read Paper →

AI & Data Science Preprint PDF DOI

Clustering Pseudo Language Family in Multilingual Translation Models with Fisher Information Matrix

Xinyu Ma, Xuebo Liu, Min Zhang · 2023

In multilingual translation research, the comprehension and utilization of language families are of paramount importance. Nevertheless, clustering languages based solely on their ancestral families ca…

Read Paper →

AI & Data Science Preprint PDF DOI

COLE: A Hierarchical Generation Framework for Multi-Layered and Editable Graphic Design

Peidong Jia, Chenxuan Li, Yuhui Yuan, Zeyu Liu, Yichao Shen, Bohan Chen, Xingru Chen, Yinglin Zheng, Dong Chen, Ji Li, Xiaodong Xie, Shanghang Zhang, Baining Guo · 2023

Graphic design, which has been evolving since the 15th century, plays a crucial role in advertising. The creation of high-quality designs demands design-oriented planning, reasoning, and layer-wise ge…

Read Paper →

AI & Data Science Preprint PDF DOI

Class Binarization to NeuroEvolution for Multiclass Classification

Gongjin Lan, Zhenyu Gao, Lingyao Tong, Ting Liu · 2023

Multiclass classification is a fundamental and challenging task in machine learning. The existing techniques of multiclass classification can be categorized as (i) decomposition into binary (ii) exten…

Read Paper →

AI & Data Science Preprint PDF DOI

Mitigating Label Bias via Decoupled Confident Learning

Yunyi Li, Maria De-Arteaga, Maytal Saar-Tsechansky · 2023

Growing concerns regarding algorithmic fairness have led to a surge in methodologies to mitigate algorithmic bias. However, such methodologies largely assume that observed labels in training data are …

Read Paper →

Browse Research Papers

On the Learning Curves of Revenue Maximization

OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning

PMMA: The Polytechnique Montreal Mobility Aids Dataset

SR4-Fit: An Interpretable and Informative Classification Algorithm Applied to Prediction of U.S. House of Representatives Elections

Low-Resource Dialect Adaptation of Large Language Models: A French Dialect Case-Study

AL-CoLe: Augmented Lagrangian for Constrained Learning

Approximating evidence via bounded harmonic means

COLE: a Comprehensive Benchmark for French Language Understanding Evaluation

Preconditioned Regularized Wasserstein Proximal Sampling

Bias-Aware Mislabeling Detection via Decoupled Confident Learning

Rethinking Layered Graphic Design Generation with a Top-Down Approach

Beyond Accuracy: EcoL2 Metric for Sustainable Neural PDE Solvers

HJ-sampler: A Bayesian sampler for inverse problems of a stochastic process by leveraging Hamilton-Jacobi PDEs and score-based generative models

OpenCOLE: Towards Reproducible Automatic Graphic Design Generation

Measuring Dependence between Events

Wasserstein proximal operators describe score-based generative models and resolve memorization

Clustering Pseudo Language Family in Multilingual Translation Models with Fisher Information Matrix

COLE: A Hierarchical Generation Framework for Multi-Layered and Editable Graphic Design

Class Binarization to NeuroEvolution for Multiclass Classification

Mitigating Label Bias via Decoupled Confident Learning

Browse by Category

Research Type

Publish Your Research