Reinforcement Learning Algorithm Selection

Romain Laroche, Raphael Feraud

Abstract

This paper formalises the problem of online algorithm selection in the context of Reinforcement Learning. The setup is as follows: given an episodic task and a finite number of off-policy RL algorithms, a meta-algorithm has to decide which RL algorithm is in control during the next episode so as to maximize the expected return. The article presents a novel meta-algorithm, called Epochal Stochastic Bandit Algorithm Selection (ESBAS). Its principle is to freeze the policy updates at each epoch, and to leave a rebooted stochastic bandit in charge of the algorithm selection. Under some assumptions, a thorough theoretical analysis demonstrates its near-optimality considering the structural sampling budget limitations. ESBAS is first empirically evaluated on a dialogue task where it is shown to outperform each individual algorithm in most configurations. ESBAS is then adapted to a true online setting where algorithms update their policies after each transition, which we call SSBAS. SSBAS is evaluated on a fruit collection task where it is shown to adapt the stepsize parameter more efficiently than the classical hyperbolic decay, and on an Atari game, where it improves the performance by a wide margin.

Keywords

Artificial Intelligence & Data Science

📄 Full Paper Available as PDF

This paper is available as a downloadable PDF.

📄 Download PDF

Comments (0)

No comments yet. Be the first to comment.

Paper Details

Authors Romain Laroche ,
Raphael Feraud
Published 2017-01-30
Category Artificial Intelligence And Data Science
Status Non-peer-reviewed Preprint
Language English
Word Count 187

Reinforcement Learning Algorithm Selection

Abstract

Keywords

✨ AI Plain-English Summary

Comments (0)

Related Papers

Digital technology, tele-medicine and artificial intelligence in...

When pandemics impact economies and climate change: Exploring the impacts of...

An empirical overview of nonlinearity and overfitting in machine learning...

Advances in Feature Selection with Mutual Information