Introduction
Bimonthly, started in 1957
Administrator
Shanxi Provincial Education Department
Sponsor
Taiyuan University of Technology
Publisher
Ed. Office of Journal of TYUT
Editor-in-Chief
SUN Hongbin
ISSN: 1007-9432
CN: 14-1220/N
Administrator
Shanxi Provincial Education Department
Sponsor
Taiyuan University of Technology
Publisher
Ed. Office of Journal of TYUT
Editor-in-Chief
SUN Hongbin
ISSN: 1007-9432
CN: 14-1220/N
location:
home> Online First

Deep Reinforcement Learning with Phasic Policy Gradient with Sample Reuse
DOI:
10.16355/j.tyut.1007-9432.20230300
abstract:
Deep reinforcement learning is a very promising area of research that can be applied to multiple domains to solve a variety of complex tasks. The phasic policy gradient with sample reuse(SRPPG) is proposed to address the problems of non-reuse of samples and low sample utilization in policybased deep reinforcement learning algorithms. The algorithm introduces offline data based on the phasic policy gradient (PPG), thus reducing the time cost of training and enabling the model to converge quickly. In this work, SR-PPG combine the stability advantages of theoretically supported on-policy algorithms with the sample efficiency of off-policy algorithms to develop policy improvement guarantees applicable to off-policy settings and to link these bounds to the clipping mechanism used by the phasic policy gradient algorithm. A series of theoretical and experimental demonstrations show that this algorithm provides better performance by effectively balancing the competing goals of stability and sample efficiency.
Keywords:
deep reinforcement learning; phasic policy gradient; sample reuse