Regularized Q-Learning with Linear Function Approximation

Jiachen Xi, Alfredo Garcia, Petar Momcilovic

Abstract

Regularized Markov Decision Processes serve as models of sequential decision making under uncertainty wherein the decision maker has limited information processing capacity and/or aversion to model ambiguity. With functional approximation, the convergence properties of learning algorithms for regularized MDPs (e.g. soft Q-learning) are not well understood because the composition of the regularized Bellman operator and a projection onto the span of basis vectors is not a contraction with respect to any norm. In this paper, we consider a bi-level optimization formulation of regularized Q-learning with linear functional approximation. The {\em lower} level optimization problem aims to identify a value function approximation that satisfies Bellman's recursive optimality condition and the {\em upper} level aims to find the projection onto the span of basis vectors. This formulation motivates a single-loop algorithm with finite time convergence guarantees. The algorithm operates on two time-scales: updates to the projection of state-action values are `slow' in that they are implemented with a step size that is smaller than the one used for `faster' updates of approximate solutions to Bellman's recursive optimality equation. We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise. In addition, we provide a performance guarantee for the policies derived from the proposed algorithm.

Keywords

Artificial Intelligence & Data Science

📄 Full Paper Available as PDF

This paper is available as a downloadable PDF.

📄 Download PDF

Comments (0)

No comments yet. Be the first to comment.

Paper Details

Authors Jiachen Xi ,
Alfredo Garcia ,
Petar Momcilovic
Published 2024-01-26
Category Artificial Intelligence And Data Science
Status Non-peer-reviewed Preprint
Language English
Word Count 211

Regularized Q-Learning with Linear Function Approximation

Abstract

Keywords

✨ AI Plain-English Summary

Comments (0)

Related Papers

Sparse matrix-variate Gaussian process blockmodels for network modeling

Hierarchical Maximum Margin Learning for Multi-Class Classification

Tightening MRF Relaxations with Planar Subproblems

Rank/Norm Regularization with Closed-Form Solutions: Application to ...