Regularly Updated Deterministic Policy Gradient Algorithm

Shuai Han, Wenbo Zhou, Shuai Lu, Jiayu Yu

Artificial Intelligence And Data Science PDF Available Non-peer-reviewed Preprint

Regularly Updated Deterministic Policy Gradient Algorithm

Shuai Han, Wenbo Zhou, Shuai Lu, Jiayu Yu · Published 2020-07-01

Expertini /
Research /
Artificial Intelligence And Data Science /
Regularly Updated Deterministic Policy...

📄 Download PDF 🔖 Bookmark Paper

Abstract

Deep Deterministic Policy Gradient (DDPG) algorithm is one of the most well-known reinforcement learning methods. However, this method is inefficient and unstable in practical applications. On the other hand, the bias and variance of the Q estimation in the target function are sometimes difficult to control. This paper proposes a Regularly Updated Deterministic (RUD) policy gradient algorithm for these problems. This paper theoretically proves that the learning procedure with RUD can make better use of new data in replay buffer than the traditional procedure. In addition, the low variance of the Q value in RUD is more suitable for the current Clipped Double Q-learning strategy. This paper has designed a comparison experiment against previous methods, an ablation experiment with the original DDPG, and other analytical experiments in Mujoco environments. The experimental results demonstrate the effectiveness and superiority of RUD.

Keywords

Artificial Intelligence & Data Science

📄 Full Paper Available as PDF

This paper is available as a downloadable PDF.

📄 Download PDF

Comments (0)

No comments yet. Be the first to comment.

Paper Details

Authors Shuai Han ,
Wenbo Zhou ,
Shuai Lu ,
Jiayu Yu
Published 2020-07-01
Category Artificial Intelligence And Data Science
Status Non-peer-reviewed Preprint
Language English
Word Count 139

Regularly Updated Deterministic Policy Gradient Algorithm

Abstract

Keywords

✨ AI Plain-English Summary

Comments (0)

Related Papers

An Efficient Algorithm for Computing Interventional Distributions in ...

Sparse matrix-variate Gaussian process blockmodels for network modeling

Hierarchical Maximum Margin Learning for Multi-Class Classification

Tightening MRF Relaxations with Planar Subproblems